Machine learning (ML) models can memorize training datasets. As a result, training ML models over private datasets can lead to the violation of individuals' privacy. Differential privacy (DP) is a rigorous privacy notion to preserve the privacy of underlying training datasets. Yet, training ML models in a DP framework usually degrades the accuracy of ML models. This paper aims to boost the accuracy of a DP logistic regression (LR) via a pre-training module. In more detail, we initially pre-train our LR model on a public training dataset that there is no privacy concern about it. Then, we fine-tune our DP-LR model with the private dataset. In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP-LR model.
( 2
min )
Blindness and other eye diseases are a global health concern, particularly in low- and middle-income countries like India. In this regard, during the COVID-19 pandemic, teleophthalmology became a lifeline, and the Grabi attachment for smartphone-based eye imaging gained in use. However, quality of user-captured image often remained inadequate, requiring clinician vetting and delays. In this backdrop, we propose an AI-based quality assessment system with instant feedback mimicking clinicians' judgments and tested on patient-captured images. Dividing the complex problem hierarchically, here we tackle a nontrivial part, and demonstrate a proof of the concept.
( 2
min )
Graphon games have been introduced to study games with many players who interact through a weighted graph of interaction. By passing to the limit, a game with a continuum of players is obtained, in which the interactions are through a graphon. In this paper, we focus on a graphon game for optimal investment under relative performance criteria, and we propose a deep learning method. The method builds upon two key ingredients: first, a characterization of Nash equilibria by forward-backward stochastic differential equations and, second, recent advances of machine learning algorithms for stochastic differential games. We provide numerical experiments on two different financial models. In each model, we compare the effect of several graphons, which correspond to different structures of interactions.
( 2
min )
We study the asymptotic behavior of second-order algorithms mixing Newton's method and inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian behavior of these methods, they almost always escape strict saddle points. We also evidence the role played by the hyper-parameters of these methods in their qualitative behavior near critical points. The theoretical results are supported by numerical illustrations.
( 2
min )
We consider bounded discrete time series. From its statistical feature, without any use of the Fourier transform, we find a suitable almost periodic function which approximates the corresponding time series in a local time interval.
( 2
min )
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
( 2
min )
Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter out-of-distribution (OOD) samples belonging to unseen classes in clinical reality. To address this, we propose a novel framework built upon a long-tailed OOD detection task for breast ultrasound images. It is equipped with a triplet state augmentation (TriAug) which improves ID classification accuracy while maintaining a promising OOD detection performance. Meanwhile, we designed a balanced sphere loss to handle the class imbalanced problem.
( 2
min )
This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.
( 2
min )
The advancement of Large Language Models (LLM) has also resulted in an equivalent proliferation in its applications. Software design, being one, has gained tremendous benefits in using LLMs as an interface component that extends fixed user stories. However, inclusion of LLM-based AI agents in software design often poses unexpected challenges, especially in the estimation of development efforts. Through the example of UI-based user stories, we provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort by taking into account data sources, interfaces and algorithms.
( 2
min )
This work proposes a class of locally differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database structure. The proposed Cascade Sampling algorithm instantiates the mechanism exactly and efficiently. Our bounds show that we obtain near-optimal utility while being empirically competitive against output perturbation methods.
( 2
min )
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex statistical dependencies of structured discrete data, can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions and data-driven averaging. Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions. Various experiments underline the approach's broad applicability.
( 2
min )
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023). Our analysis provides new theoretical results on categorical approaches to distributional RL, and also introduces a new distributional Bellman equation, the stochastic categorical CDF Bellman equation, which we expect to be of independent interest. We also provide an experimental study comparing several model-based distributional RL algorithms, with several takeaways for practitioners.
( 2
min )
We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our proposed framework, the task of multivariate time series forecasting is formulated as a channel-wise next curve shape prediction problem, where each time series sample is represented as a sequence of non-overlapping curve shapes with a unified numerical magnitude. GTT is trained to predict the next curve shape based on a window of past curve shapes in a channel-wise manner. Experimental results demonstrate that GTT exhibits superior zero-shot multivariate forecasting capabilities on unseen time series datasets, even surpassing state-of-the-art supervised baselines. Additionally, we investigate the impact of varying GTT model parameters and training dataset scales, observing that the scaling law also holds in the context of zero-shot multivariate time series forecasting.
( 2
min )
We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.
( 2
min )
This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency model, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily for introducing the beginners to the field, and practitioners may also find some analysis useful in designing new models or algorithms.
( 2
min )
A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems for which various solution methods exist. This enables us to establish convergence guarantees while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. Empirical evaluations demonstrate that IPRO matches or outperforms methods that require additional domain knowledge. By leveraging problem-specific single-objective solvers, our approach also holds promise for applications beyond multi-objective reinforcement learning, such as in pathfinding and optimisation.
( 2
min )
Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. More importantly, for the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework. Lastly, we conduct numerical experiments to validate our theoretical findings.
( 2
min )
This work addresses the performance comparison between four clustering techniques with the objective of achieving strong hybrid models in supervised learning tasks. A real dataset from a bio-climatic house named Sotavento placed on experimental wind farm and located in Xermade (Lugo) in Galicia (Spain) has been collected. Authors have chosen the thermal solar generation system in order to study how works applying several cluster methods followed by a regression technique to predict the output temperature of the system. With the objective of defining the quality of each clustering method two possible solutions have been implemented. The first one is based on three unsupervised learning metrics (Silhouette, Calinski-Harabasz and Davies-Bouldin) while the second one, employs the most common error measurements for a regression algorithm such as Multi Layer Perceptron.
( 2
min )
Several methods have been proposed for correcting the elevation bias in digital elevation models (DEMs) for example, linear regression. Nowadays, supervised machine learning enables the modelling of complex relationships between variables, and has been deployed by researchers in a variety of fields. In the existing literature, several studies have adopted either machine learning or statistical approaches in the task of DEM correction. However, to our knowledge, none of these studies have compared the performance of both approaches, especially with regard to open-access global DEMs. Our previous work has already shown the potential of machine learning approaches, specifically gradient boosted decision trees (GBDTs) for DEM correction. In this study, we share some results from the comparison of three recent implementations of gradient boosted decision trees (XGBoost, LightGBM and CatBoost), versus multiple linear regression (MLR) for enhancing the vertical accuracy of 30 m Copernicus and AW3D global DEMs in Cape Town, South Africa.
( 2
min )
Using electronic health records data and machine learning to guide future decisions needs to address challenges, including 1) long/short-term dependencies and 2) interactions between diseases and interventions. Bidirectional transformers have effectively addressed the first challenge. Here we tackled the latter challenge by masking one source (e.g., ICD10 codes) and training the transformer to predict it using other sources (e.g., ATC codes).
( 2
min )
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
( 2
min )
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex statistical dependencies of structured discrete data, can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions and data-driven averaging. Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions. Various experiments underline the approach's broad applicability.
( 2
min )
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023). Our analysis provides new theoretical results on categorical approaches to distributional RL, and also introduces a new distributional Bellman equation, the stochastic categorical CDF Bellman equation, which we expect to be of independent interest. We also provide an experimental study comparing several model-based distributional RL algorithms, with several takeaways for practitioners.
( 2
min )
Perhaps the greatest challenge – and opportunity – of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen. This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context […]
The post GraphRAG: Unlocking LLM discovery on narrative private data appeared first on Microsoft Research.
( 15
min )
This post is co-written with Santosh Waddi and Nanda Kishore Thatikonda from BigBasket. BigBasket is India’s largest online food and grocery store. They operate in multiple ecommerce channels such as quick commerce, slotted delivery, and daily subscriptions. You can also buy from their physical stores and vending machines. They offer a large assortment of over […]
( 9
min )
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used […]
( 11
min )
Chatbots are used by millions of people around the world every day, powered by NVIDIA GPU-based cloud servers. Now, these groundbreaking tools are coming to Windows PCs powered by NVIDIA RTX for local, fast, custom generative AI. Chat with RTX, now free to download, is a tech demo that lets users personalize a chatbot with
Read Article
( 6
min )
Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT.
( 7
min )
We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT). The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images, whereas conventional methods cannot avoid the influence of image encryption. A domain adaptation method is used to efficiently fine-tune ViT with encrypted images. In experiments, the method is demonstrated to outperform conventional methods in an image classification task on the CIFAR-10 and ImageNet datasets in terms of classification accuracy.
( 2
min )
We demonstrate a multiplication method based on numbers represented as set of polynomial radix 2 indices stored as an integer list. The 'polynomial integer index multiplication' method is a set of algorithms implemented in python code. We demonstrate the method to be faster than both the Number Theoretic Transform (NTT) and Karatsuba for multiplication within a certain bit range. Also implemented in python code for comparison purposes with the polynomial radix 2 integer method. We demonstrate that it is possible to express any integer or real number as a list of integer indices, representing a finite series in base two. The finite series of integer index representation of a number can then be stored and distributed across multiple CPUs / GPUs. We show that operations of addition and multiplication can be applied as two's complement additions operating on the index integer representations and can be fully distributed across a given CPU / GPU architecture. We demonstrate fully distributed arithmetic operations such that the 'polynomial integer index multiplication' method overcomes the current limitation of parallel multiplication methods. Ie, the need to share common core memory and common disk for the calculation of results and intermediate results.
( 3
min )
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.
( 2
min )
We show that the Rademacher complexity-based approach can generate non-vacuous generalisation bounds on Convolutional Neural Networks (CNNs) for classifying a small number of classes of images. The development of new Talagrand's contraction lemmas for high-dimensional mappings between function spaces and CNNs for general Lipschitz activation functions is a key technical contribution. Our results show that the Rademacher complexity does not depend on the network length for CNNs with some special types of activation functions such as ReLU, Leaky ReLU, Parametric Rectifier Linear Unit, Sigmoid, and Tanh.
( 2
min )
We study the online learnability of hypothesis classes with respect to arbitrary, but bounded loss functions. No characterization of online learnability is known at this level of generality. We give a new scale-sensitive combinatorial dimension, named the sequential minimax dimension, and show that it gives a tight quantitative characterization of online learnability. In addition, we show that the sequential minimax dimension subsumes most existing combinatorial dimensions in online learning theory.
( 2
min )
We introduce a new dataset named WikiVitals which contains a large graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the label prediction in a semi-supervised node classification setting, namely the content of the articles, their connections with each other and the correlations among their labels. We perform this evaluation using a Graph Markov Neural Network which provides a theoretically principled model for this task and we conduct a detailed evaluation of the contributions of each sources of information using a clear separation of model selection and model assessment. One interesting observation is that including the effect of label dependencies is more relevant for sparse train sets than it is for dense train sets.
( 2
min )
3D building models with facade details are playing an important role in many applications now. Classifying point clouds at facade-level is key to create such digital replicas of the real world. However, few studies have focused on such detailed classification with deep neural networks. We propose a method fusing geometric features with deep learning networks for point cloud classification at facade-level. Our experiments conclude that such early-fused features improve deep learning methods' performance. This method can be applied for compensating deep learning networks' ability in capturing local geometric information and promoting the advancement of semantic segmentation.
( 2
min )
We present a self-contained proof of the convergence rate of the Stochastic Gradient Descent (SGD) when the learning rate follows an inverse time decays schedule; we next apply the results to the convergence of a modified form of policy gradient Multi-Armed Bandit (MAB) with $L2$ regularization.
( 2
min )
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating the computational efficiency of our method against state-of-the-art testing methods.
( 2
min )
In this paper, we present a fully automatic brain tumor segmentation and classification model using a Deep Convolutional Neural Network that includes a multiscale approach. One of the differences of our proposal with respect to previous works is that input images are processed in three spatial scales along different processing pathways. This mechanism is inspired in the inherent operation of the Human Visual System. The proposed neural model can analyze MRI images containing three types of tumors: meningioma, glioma, and pituitary tumor, over sagittal, coronal, and axial views and does not need preprocessing of input images to remove skull or vertebral column parts in advance. The performance of our method on a publicly available MRI image dataset of 3064 slices from 233 patients is compared with previously classical machine learning and deep learning published methods. In the comparison, our method remarkably obtained a tumor classification accuracy of 0.973, higher than the other approaches using the same database.
( 2
min )
The use of a wide range of computer vision solutions, and more recently high-end Inertial Measurement Units (IMU) have become increasingly popular for assessing human physical activity in clinical and research settings. Nevertheless, to increase the feasibility of patient tracking in out-of-the-lab settings, it is necessary to use a reduced number of devices for movement acquisition. Promising solutions in this context are IMU-based wearables and single camera systems. Additionally, the development of machine learning systems able to recognize and digest clinically relevant data in-the-wild is needed, and therefore determining the ideal input to those is crucial.
( 2
min )
We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively.
( 2
min )
Time series analysis is relevant in various disciplines such as physics, biology, chemistry, and finance. In this paper, we present a novel neural network architecture that integrates elements from ResNet structures, while introducing the innovative incorporation of the Taylor series framework. This approach demonstrates notable enhancements in test accuracy across many of the baseline datasets investigated. Furthermore, we extend our method to incorporate a recursive step, which leads to even further improvements in test accuracy. Our findings underscore the potential of our proposed model to significantly advance time series analysis methodologies, offering promising avenues for future research and application.
( 2
min )
In recent years Deep Neural Network-based systems are not only increasing in popularity but also receive growing user trust. However, due to the closed-world assumption of such systems, they cannot recognize samples from unknown classes and often induce an incorrect label with high confidence. Presented work looks at the evaluation of methods for Open Set Recognition, focusing on the impact of class imbalance, especially in the dichotomy between known and unknown samples. As an outcome of problem analysis, we present a set of guidelines for evaluation of methods in this field.
( 2
min )
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.
( 2
min )
We show that the Rademacher complexity-based approach can generate non-vacuous generalisation bounds on Convolutional Neural Networks (CNNs) for classifying a small number of classes of images. The development of new Talagrand's contraction lemmas for high-dimensional mappings between function spaces and CNNs for general Lipschitz activation functions is a key technical contribution. Our results show that the Rademacher complexity does not depend on the network length for CNNs with some special types of activation functions such as ReLU, Leaky ReLU, Parametric Rectifier Linear Unit, Sigmoid, and Tanh.
( 2
min )
We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively.
( 2
min )
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating the computational efficiency of our method against state-of-the-art testing methods.
( 2
min )
We present a self-contained proof of the convergence rate of the Stochastic Gradient Descent (SGD) when the learning rate follows an inverse time decays schedule; we next apply the results to the convergence of a modified form of policy gradient Multi-Armed Bandit (MAB) with $L2$ regularization.
( 2
min )
This post is co-written with Kostia Kofman and Jenny Tokar from Booking.com. As a global leader in the online travel industry, Booking.com is always seeking innovative ways to enhance its services and provide customers with tailored and seamless experiences. The Ranking team at Booking.com plays a pivotal role in ensuring that the search and recommendation […]
( 12
min )
Every country needs to own the production of their own intelligence, NVIDIA founder and CEO Jensen Huang told attendees Monday at the World Governments Summit in Dubai. Huang, who spoke as part of a fireside chat with the UAE’s Minister of AI, His Excellency Omar Al Olama, described sovereign AI — which emphasizes a country’s
Read Article
( 6
min )
Generative AI is driving change across industries — and to take advantage of its benefits, businesses must select the right hardware to power their workflows. The new NVIDIA RTX 2000 Ada Generation GPU delivers the latest AI, graphics and compute technology to compact workstations, offering up to 1.5x the performance of the previous-generation RTX A2000
Read Article
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
The power requirements posed by the fifth-generation and beyond cellular networks are an important constraint in network deployment and require energy-efficient solutions. In this work, we propose a novel user load transfer approach using airborne base stations (BS) mounted on drones for reliable and secure power redistribution across the micro-grid network comprising green small cell BSs. Depending on the user density and the availability of an aerial BS, the energy requirement of a cell with an energy deficit is accommodated by migrating the aerial BS from a high-energy to a low-energy cell. The proposed hybrid drone-based framework integrates long short-term memory with unique cost functions using an evolutionary neural network for drones and BSs and efficiently manages energy and load redistribution. The proposed algorithm reduces power outages at BSs and maintains consistent throughput stability, thereby demonstrating its capability to boost the reliability and robustness of wireless communication systems.
( 2
min )
In recent years, there has been an intense debate about how learning in biological neural networks (BNNs) differs from learning in artificial neural networks. It is often argued that the updating of connections in the brain relies only on local information, and therefore a stochastic gradient-descent type optimization method cannot be used. In this paper, we study a stochastic model for supervised learning in BNNs. We show that a (continuous) gradient step occurs approximately when each learning opportunity is processed by many local updates. This result suggests that stochastic gradient descent may indeed play a role in optimizing BNNs.
( 2
min )
This work proposes $\mu$GUIDE: a general Bayesian framework to estimate posterior distributions of tissue microstructure parameters from any given biophysical model or MRI signal representation, with exemplar demonstration in diffusion-weighted MRI. Harnessing a new deep learning architecture for automatic signal feature selection combined with simulation-based inference and efficient sampling of the posterior distributions, $\mu$GUIDE bypasses the high computational and time cost of conventional Bayesian approaches and does not rely on acquisition constraints to define model-specific summary statistics. The obtained posterior distributions allow to highlight degeneracies present in the model definition and quantify the uncertainty and ambiguity of the estimated parameters.
( 2
min )
We present a framework for learning Hamiltonian systems using data. This work is based on a lifting hypothesis, which posits that nonlinear Hamiltonian systems can be written as nonlinear systems with cubic Hamiltonians. By leveraging this, we obtain quadratic dynamics that are Hamiltonian in a transformed coordinate system. To that end, for given generalized position and momentum data, we propose a methodology to learn quadratic dynamical systems, enforcing the Hamiltonian structure in combination with a weakly-enforced symplectic auto-encoder. The obtained Hamiltonian structure exhibits long-term stability of the system, while the cubic Hamiltonian function provides relatively low model complexity. For low-dimensional data, we determine a higher-dimensional transformed coordinate system, whereas for high-dimensional data, we find a lower-dimensional coordinate system with the desired properties. We demonstrate the proposed methodology by means of both low-dimensional and high-dimensional nonlinear Hamiltonian systems.
( 2
min )
Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area.
( 2
min )
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is introducing new proof methods that result in tighter bounds for multi-armed BAI compared to existing methods. We extensively compare our approach to other fixed-budget BAI methods, demonstrating its consistent and robust performance in various settings. Our work improves our understanding of Bayesian fixed-budget BAI in structured bandits and highlights the effectiveness of our approach in practical scenarios.
( 2
min )
Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-world setting. To address this challenge, we propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process, where the targets are derived from a visually-ground speech model, notably eliminating the need for speech-text paired data. Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
( 2
min )
Background: Eating disorders are increasingly prevalent, and social networks offer valuable information.
Objective: Our goal was to identify efficient machine learning models for categorizing tweets related to eating disorders.
Methods: Over three months, we collected tweets about eating disorders. A 2,000-tweet subset was labeled for: (1) being written by individuals with eating disorders, (2) promoting eating disorders, (3) informativeness, and (4) scientific content. Both traditional machine learning and deep learning models were employed for classification, assessing accuracy, F1 score, and computational time.
Results: From 1,058,957 collected tweets, transformer-based bidirectional encoder representations achieved the highest F1 scores (71.1%-86.4%) across all four categories.
Conclusions: Transformer-based models outperform traditional techniques in classifying eating disorder-related tweets, though they require more computational resources.
( 2
min )
We consider the problem of learning local quantum Hamiltonians given copies of their Gibbs state at a known inverse temperature, following Haah et al. [2108.04842] and Bakshi et al. [arXiv:2310.02243]. Our main technical contribution is a new flat polynomial approximation of the exponential function based on the Chebyshev expansion, which enables the formulation of learning quantum Hamiltonians as a polynomial optimization problem. This, in turn, can benefit from the use of moment/SOS relaxations, whose polynomial bit complexity requires careful analysis [O'Donnell, ITCS 2017]. Finally, we show that learning a $k$-local Hamiltonian, whose dual interaction graph is of bounded degree, runs in polynomial time under mild assumptions.
( 2
min )
In this paper, we first present the character texture generation system \textit{Minecraft-ify}, specified to Minecraft video game toward in-game application. Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold. While existing projects or works only generate texture, proposed system can inverse the user-provided real image, or generate average/random appearance from learned distribution. Moreover, it can be manipulated with text-guidance using StyleGAN and StyleCLIP. These features provide a more extended user experience with enlarged freedom as a user-friendly AI-tool. Project page can be found at https://gh-bumsookim.github.io/Minecraft-ify/
( 2
min )
Monitoring the status of large computing systems is essential to identify unexpected behavior and improve their performance and uptime. However, due to the large-scale and distributed design of such computing systems as well as a large number of monitoring parameters, automated monitoring methods should be applied. Such automatic monitoring methods should also have the ability to adapt themselves to the continuous changes in the computing system. In addition, they should be able to identify behavioral anomalies in useful time, to perform appropriate reactions. This work proposes a general lightweight and unsupervised method for near real-time anomaly detection using operational data measurement on large computing systems. The proposed model requires as little as 4 hours of data and 50 epochs for each training process to accurately resemble the behavioral pattern of computing systems.
( 2
min )
We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($\lambda$) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q($\lambda$) and validate theoretical insights with tabular experiments. We show how distributional Q($\lambda$)-C51, a combination of Q($\lambda$) with the C51 agent, exhibits promising results on deep RL benchmarks.
( 2
min )
We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, provided that the single-armed relaxed problem is unichain and aperiodic. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Uniform Global Attractor Property (UGAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).
( 2
min )
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is introducing new proof methods that result in tighter bounds for multi-armed BAI compared to existing methods. We extensively compare our approach to other fixed-budget BAI methods, demonstrating its consistent and robust performance in various settings. Our work improves our understanding of Bayesian fixed-budget BAI in structured bandits and highlights the effectiveness of our approach in practical scenarios.
( 2
min )
In this post, we show you how to build an internal SaaS layer to access foundation models with Amazon Bedrock in a multi-tenant (team) architecture. We specifically focus on usage and cost tracking per tenant and also controls such as usage throttling per tenant. We describe how the solution and Amazon Bedrock consumption plans map to the general SaaS journey framework. The code for the solution and an AWS Cloud Development Kit (AWS CDK) template is available in the GitHub repository.
( 13
min )
2024 is the year of great data science predictions targeting big business churn. It is the time to yield benefits from the popular data science frameworks that are streamed to do wonders for industries far and wide. Data science is not just a spoof on the big number game that guides businesses’ growth. It is… Read More »10 Prominent Data Science Predictions 2024- Know What the Industry Experts Say?
The post 10 Prominent Data Science Predictions 2024- Know What the Industry Experts Say? appeared first on Data Science Central.
( 22
min )
Autonomous helicopters made by Rotor Technologies, a startup led by MIT PhDs, take the human out of risky commercial missions.
( 7
min )
Many imitation learning (IL) algorithms employ inverse reinforcement learning (IRL) to infer the intrinsic reward function that an expert is implicitly optimizing for based on their demonstrated behaviors. However, in practice, IRL-based IL can fail to accomplish the underlying task due to a misalignment between the inferred reward and the objective of the task. In this paper, we address the susceptibility of IL to such misalignment by introducing a semi-supervised reward design paradigm called Protagonist Antagonist Guided Adversarial Reward (PAGAR). PAGAR-based IL trains a policy to perform well under mixed reward functions instead of a single reward function as in IRL-based IL. We identify the theoretical conditions under which PAGAR-based IL can avoid the task failures caused by reward misalignment. We also present a practical on-and-off policy approach to implementing PAGAR-based IL. Experimental results show that our algorithm outperforms standard IL baselines in complex tasks and challenging transfer settings.
( 2
min )
Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not effectively indicate a trained DNN's ability to handle input perturbations. As such, we analytically verify robustness and trustworthiness of DNNs to input perturbations by treating them as mixed-integer linear programming (MILP) problems. The ability of batch normalization in addressing the scalability limitations of the MILP formulation is also highlighted. The framework is validated by performing time-synchronized distribution system state estimation for a modified IEEE 34-node system and a real-world large distribution system, both of which are incompletely observed by micro-phasor measurement units.
( 2
min )
We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.
( 2
min )
Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.
( 2
min )
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.
( 2
min )
Graph transformers typically lack direct pair-to-pair communication, instead forcing neighboring pairs to exchange information via a common node. We propose the Triplet Graph Transformer (TGT) that enables direct communication between two neighboring pairs in a graph via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first predicting interatomic distances from 2D graphs and then using these distances for downstream tasks. A novel three-stage training procedure and stochastic inference further improve training efficiency and model performance. Our model achieves new state-of-the-art (SOTA) results on open challenge benchmarks PCQM4Mv2 and OC20 IS2RE. We also obtain SOTA results on QM9, MOLPCBA, and LIT-PCBA molecular property prediction benchmarks via transfer learning. We also demonstrate the generality of TGT with SOTA results on the traveling salesman problem (TSP).
( 2
min )
This thesis proposes and describes a research attempt at designing and developing a speaker independent spontaneous automatic speech recognition system for Tigrigna The acoustic model of the Speech Recognition System is developed using Carnegie Mellon University Automatic Speech Recognition development tool (Sphinx) while the SRIM tool is used for the development of the language model.
Keywords Automatic Speech Recognition Tigrigna language
( 2
min )
We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.
( 2
min )
The probabilistic formal verification (PFV) of AI systems is in its infancy. So far, approaches have been limited to ad-hoc algorithms for specific classes of models and/or properties.
We propose a unifying framework for the PFV of AI systems based onWeighted Model Integration (WMI), which allows to frame the problem in very general terms.
Crucially, this reduction enables the verification of many properties of interest, like fairness, robustness or monotonicity, over a wide range of machine learning models, without making strong distributional assumptions.
We support the generality of the approach by solving multiple verification tasks with a single, off-the-shelf WMI solver, then discuss the scalability challenges and research directions related to this promising framework.
( 2
min )
We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers. Using an unbiassed compression technique, we develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods. Moreover, we show that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication. We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
( 2
min )
In this paper, we present a novel sequential team selection model in soccer. Specifically, we model the stochastic process of player injury and unavailability using player-specific information learned from real-world soccer data. Monte-Carlo Tree Search is used to select teams for games that optimise long-term team performance across a soccer season by reasoning over player injury probability. We validate our approach compared to benchmark solutions for the 2018/19 English Premier League season. Our model achieves similar season expected points to the benchmark whilst reducing first-team injuries by ~13% and the money inefficiently spent on injured players by ~11% - demonstrating the potential to reduce costs and improve player welfare in real-world soccer teams.
( 2
min )
The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.
( 2
min )
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.
( 2
min )
NVIDIA has joined the National Institute of Standards and Technology’s new U.S. Artificial Intelligence Safety Institute Consortium as part of the company’s effort to advance safe, secure and trustworthy AI. AISIC will work to create tools, methodologies and standards to promote the safe and trustworthy development and deployment of AI. As a member, NVIDIA will
Read Article
( 5
min )
The GeForce NOW anniversary celebrations continue with more games and a member-exclusive discount on the Logitech G Cloud. Among the six new titles coming to the cloud this week is The Inquisitor from Kalypso Media, which spotlights the GeForce NOW anniversary with a special shout-out. “Congrats to four years of empowering gamers to play anywhere,
Read Article
( 6
min )
Nonprofit fundraising tools can be excellent resources for assisting organizations in maintaining compliance. However, anyone considering these platforms should know a few things to stay on the right track and avoid issues. Organizations must protect donors’ privacy When a nonprofit’s staff members know details about donors’ sexual orientation, income, race, age and ethnicity, it’s easier… Read More »What nonprofits need to know about compliance for fundraising software
The post What nonprofits need to know about compliance for fundraising software appeared first on Data Science Central.
( 21
min )
Generative AI agents are a versatile and powerful tool for large enterprises. They can enhance operational efficiency, customer service, and decision-making while reducing costs and enabling innovation. These agents excel at automating a wide range of routine and repetitive tasks, such as data entry, customer support inquiries, and content generation. Moreover, they can orchestrate complex, […]
( 19
min )
Over the eight months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche-industry of papers have emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been either natural language text or stylized, code-like language. Drawing inspiration from the prowess we expect a truly human-level intelligent agent to have across multiple signal modalities, in this work we examine GPT3.5's aptitude for visual tasks, where the inputs feature content provided as ASCII-art without overt distillation into a lingual summary. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms typical in visual settings, trials investigating knowledge of image parts, and tasks covering image generation.
( 3
min )
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.
( 2
min )
In this study, we explore the synergy of deep learning and financial market applications, focusing on pair trading. This market-neutral strategy is integral to quantitative finance and is apt for advanced deep-learning techniques. A pivotal challenge in pair trading is discerning temporal correlations among entities, necessitating the integration of diverse data modalities. Addressing this, we introduce a novel framework, Multi-modal Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and discrete features into a temporal graph and employs a memory-based temporal graph neural network. This approach reframes temporal correlation identification as a temporal graph link prediction task, which has shown empirical success. Our experiments on real-world datasets confirm the superior performance of MTRGL, emphasizing its promise in refining automated pair trading strategies.
( 2
min )
In this paper, we perform a non-asymptotic analysis of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the bias introduced by local training with heterogeneous agents, and investigate the sample complexity of the algorithm. We show that the communication complexity of FedLSA scales polynomially with the desired precision $\epsilon$, which limits the benefits of federation. To overcome this, we propose SCAFFLSA, a novel variant of FedLSA, that uses control variates to correct the bias of local training, and prove its convergence without assumptions on statistical heterogeneity. We apply the proposed methodology to federated temporal difference learning with linear function approximation, and analyze the corresponding complexity improvements.
( 2
min )
Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required when the data set is highly imbalanced. For such data sets, SMOTE, our proposals, or undersampling procedures are the best strategies.
( 2
min )
This paper explores the integration of Explainable Automated Machine Learning (AutoML) in the realm of financial engineering, specifically focusing on its application in credit decision-making. The rapid evolution of Artificial Intelligence (AI) in finance has necessitated a balance between sophisticated algorithmic decision-making and the need for transparency in these systems. The focus is on how AutoML can streamline the development of robust machine learning models for credit scoring, while Explainable AI (XAI) methods, particularly SHapley Additive exPlanations (SHAP), provide insights into the models' decision-making processes. This study demonstrates how the combination of AutoML and XAI not only enhances the efficiency and accuracy of credit decisions but also fosters trust and collaboration between humans and AI systems. The findings underscore the potential of explainable AutoML in improving the transparency and accountability of AI-driven financial decisions, aligning with regulatory requirements and ethical considerations.
( 2
min )
This paper introduces a novel decision-making framework that promotes consistency among decisions made by diverse models while utilizing external knowledge. Leveraging the Integer Linear Programming (ILP) framework, we map predictions from various models into globally normalized and comparable values by incorporating information about decisions' prior probability, confidence (uncertainty), and the models' expected accuracy. Our empirical study demonstrates the superiority of our approach over conventional baselines on multiple datasets.
( 2
min )
Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Sch\"utt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimazaki, 2017) on which subsequent NNs are trained within a week. DFT labels molecules by minimizing energy $E(\cdot )$ as a "loss function." We bypass dataset creation by directly training NNs with $E(\cdot )$ as a loss function. For comparison, Sch\"utt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h.
( 2
min )
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
( 2
min )
Gaussian processes are a widely embraced technique for regression and classification due to their good prediction accuracy, analytical tractability and built-in capabilities for uncertainty quantification. However, they suffer from the curse of dimensionality whenever the number of variables increases. This challenge is generally addressed by assuming additional structure in theproblem, the preferred options being either additivity or low intrinsic dimensionality. Our contribution for high-dimensional Gaussian process modeling is to combine them with a multi-fidelity strategy, showcasing the advantages through experiments on synthetic functions and datasets.
( 2
min )
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.
( 2
min )
In this paper, we perform a non-asymptotic analysis of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the bias introduced by local training with heterogeneous agents, and investigate the sample complexity of the algorithm. We show that the communication complexity of FedLSA scales polynomially with the desired precision $\epsilon$, which limits the benefits of federation. To overcome this, we propose SCAFFLSA, a novel variant of FedLSA, that uses control variates to correct the bias of local training, and prove its convergence without assumptions on statistical heterogeneity. We apply the proposed methodology to federated temporal difference learning with linear function approximation, and analyze the corresponding complexity improvements.
( 2
min )
Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required when the data set is highly imbalanced. For such data sets, SMOTE, our proposals, or undersampling procedures are the best strategies.
( 2
min )
In the first post of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. In the second post, we discussed an approach to develop a deep learning-based computer vision model […]
( 10
min )
The emergence of large language models (LLMs) has revolutionized the way people create text and interact with computing. However, these models are limited in ensuring the accuracy of the content they generate and enforcing strict compliance with specific formats, such as JSON and other computer programming languages. Additionally, LLMs that process information from multiple sources […]
The post AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM appeared first on Microsoft Research.
( 11
min )
Research Focus: New Research Forum series explores bold ideas in the era of AI; LASER improves reasoning in language models; Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets; Six Microsoft researchers named 2023 ACM Fellows.
The post Research Focus: Week of February 5, 2024 appeared first on Microsoft Research.
( 10
min )
With advances in computing, sophisticated AI models and machine learning are having a profound impact on business and society. Industries can use AI to quickly analyze vast bodies of data, allowing them to derive meaningful insights, make predictions and automate processes for greater efficiency. In the public sector, government agencies are achieving superior disaster preparedness.
Read Article
( 14
min )
The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposition (EDMD). In this paper, we propose Stability- and certificate-oriented EDMD (SafEDMD): a novel EDMD-based learning architecture which comes along with rigorous certificates, resulting in a reliable surrogate model generated in a data-driven fashion. To ensure trustworthiness of SafEDMD, we derive proportional error bounds, which vanish at the origin and are tailored for control tasks, leading to certified controller design based on semi-definite programming. We illustrate the developed machinery by means of several benchmark examples and highlight the advantages over state-of-the-art methods.
( 2
min )
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.
( 2
min )
We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterization where $\epsilon$ defines the optimality error. This improves the state-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.
( 2
min )
Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables such as frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work is the first to establish a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.
( 2
min )
Transformer-based models still face the structural limitation of fixed context length in processing long sequence input despite their effectiveness in various fields. While various external memory techniques were introduced, most previous techniques fail to avoid fateful forgetting, where even the most important memories are inevitably forgotten after a sufficient number of time steps. We designed Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories related to memory. Experimentally, we demonstrated the effectiveness of Memoria in tasks such as sorting and language modeling, surpassing conventional techniques.
( 2
min )
Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO). However, its practicality is hindered by the necessity for a large number of reward evaluations, especially in scenarios involving computationally intensive function assessments. To enhance the sample efficiency, we propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various DRL methods. Our method leverages high-reward samples to encourage exploration of the under-explored symmetric regions without additional online interactions - free. Through replay training, the policy is trained to maximize the likelihood of the symmetric trajectories of discovered high-rewarded samples. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks, such as molecular optimization and hardware design.
( 2
min )
In this paper, we clarify the crucial difference between a deep neural network and the Fourier series. For the multiple Fourier series of periodization of some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the behavior of the spherical partial sum and discovered the third phenomenon other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular, the third one exhibits prevention of pointwise convergence. In contrast to it, we give a specific deep neural network and prove pointwise convergence.
( 2
min )
Consensus control in multi-agent systems has received significant attention and practical implementation across various domains. However, managing consensus control under unknown dynamics remains a significant challenge for control design due to system uncertainties and environmental disturbances. This paper presents a novel learning-based distributed control law, augmented by an auxiliary dynamics. Gaussian processes are harnessed to compensate for the unknown components of the multi-agent system. For continuous enhancement in predictive performance of Gaussian process model, a data-efficient online learning strategy with a decentralized event-triggered mechanism is proposed. Furthermore, the control performance of the proposed approach is ensured via the Lyapunov theory, based on a probabilistic guarantee for prediction error bounds. To demonstrate the efficacy of the proposed learning-based controller, a comparative analysis is conducted, contrasting it with both conventional distributed control laws and offline learning methodologies.
( 2
min )
We examine the impact of homograph attacks on the Sentiment Analysis (SA) task of different Arabic dialects from the Maghreb North-African countries. Homograph attacks result in a 65.3% decrease in transformer classification from an F1-score of 0.95 to 0.33 when data is written in "Arabizi". The goal of this study is to highlight LLMs weaknesses' and to prioritize ethical and responsible Machine Learning.
( 2
min )
This paper introduces DogSurf - a newapproach of using quadruped robots to help visually impaired people navigate in real world. The presented method allows the quadruped robot to detect slippery surfaces, and to use audio and haptic feedback to inform the user when to stop. A state-of-the-art GRU-based neural network architecture with mean accuracy of 99.925% was proposed for the task of multiclass surface classification for quadruped robots. A dataset was collected on a Unitree Go1 Edu robot. The dataset and code have been posted to the public domain.
( 2
min )
This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty predictions. A standout feature is its exceptional efficiency in deriving the aggregation weights achieved by circumventing the computationally intensive posterior variance calculations. Through Lyapunov stability analysis, the distributed control law ensures bounded tracking errors with high probability. Simulation experiments validate the protocol's efficacy in effectively managing complex scenarios, establishing it as a promising solution for robust tracking control in multi-agent systems characterized by uncertain dynamics and dynamic communication structures.
( 2
min )
This literature review gives an overview of current approaches to perform domain adaptation in a low-resource and approaches to perform multilingual semantic search in a low-resource setting. We developed a new typology to cluster domain adaptation approaches based on the part of dense textual information retrieval systems, which they adapt, focusing on how to combine them efficiently. We also explore the possibilities of combining multilingual semantic search with domain adaptation approaches for dense retrievers in a low-resource setting.
( 2
min )
Adversarial Malware Generation (AMG), the generation of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense. However, the majority of extant works offer subtle perturbations or additions to executable files and do not explore full-file obfuscation. In this study, we show that an open-source encryption tool coupled with a Reinforcement Learning (RL) framework can successfully obfuscate malware to evade state-of-the-art malware detection engines and outperform techniques that use advanced modification methods. Our results show that the proposed method improves the evasion rate from 27%-49% compared to widely-used state-of-the-art reinforcement learning-based methods.
( 2
min )
Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.
( 2
min )
With the increasing use of big data and business analytics, data storytelling has gained popularity as an effective means of communicating analytical insights to audiences to support decision making and improve business performance. However, there is little empirical evidence on the impact of data storytelling on data understanding. This study validates the concept of data storytelling as a construct in terms of its impact on users' data understanding. Based on empirical data analysis, the results of this study show that data storytelling competence is positively associated with organizational performance, which is partly due to the quality of the decision is conveyed. These results provide a theoretical basis for further investigation of potential antecedents and consequences of data storytelling.
( 2
min )
In digital health, the strategy of allocating a limited treatment budget across available risk times is crucial to reduce user fatigue. This strategy, however, encounters a significant obstacle due to the unknown actual number of risk times, a factor not adequately addressed by existing methods lacking theoretical guarantees. This paper introduces, for the first time, the online uniform risk times sampling problem within the approximation algorithm framework. We propose two online approximation algorithms for this problem, one with and one without learning augmentation, and provide rigorous theoretical performance guarantees for them using competitive ratio analysis. We assess the performance of our algorithms using both synthetic experiments and a real-world case study on HeartSteps mobile applications.
( 2
min )
A methodology that seeks to enhance model prediction performance is presented. The method involves generating multiple auxiliary models that capture relationships between attributes as a function of each other. Such information serves to generate additional informative columns in the dataset that can potentially enhance target prediction. A proof of case and related code is provided.
( 2
min )
The paper presents the exact formula for the vector field that minimizes the loss for the standard flow. This formula depends analytically on a given distribution \rho_0 and an unknown one \rho_1. Based on the presented formula, a new loss and algorithm for training a vector field model in the style of Conditional Flow Matching are provided. Our loss, in comparison to the standard Conditional Flow Matching approach, exhibits smaller variance when evaluated through Monte Carlo sampling methods. Numerical experiments on synthetic models and models on tabular data of large dimensions demonstrate better learning results with the use of the presented algorithm.
( 2
min )
Infrared (IR) spectroscopy is a pivotal technique in chemical research for elucidating molecular structures and dynamics through vibrational and rotational transitions. However, the intricate molecular fingerprints characterized by unique vibrational and rotational patterns present substantial analytical challenges. Here, we present a machine learning approach employing a Structural Attention Mechanism tailored to enhance the prediction and interpretation of infrared spectra, particularly for diazo compounds. Our model distinguishes itself by honing in on chemical information proximal to functional groups, thereby significantly bolstering the accuracy, robustness, and interpretability of spectral predictions. This method not only demystifies the correlations between infrared spectral features and molecular structures but also offers a scalable and efficient paradigm for dissecting complex molecular interactions.
( 2
min )
This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a novel gradient descent approach, drawing inspiration from multiscale algorithms used in scientific computing. This approach seeks to transcend empirical learning rate selection, offering a more systematic, data-informed strategy to enhance training efficiency, especially in the later stages.
( 2
min )
Despite deep learning's widespread success, its data-hungry and computationally expensive nature makes it impractical for many data-constrained real-world applications. Few-Shot Learning (FSL) aims to address these limitations by enabling rapid adaptation to novel learning tasks, seeing significant growth in recent years. This survey provides a comprehensive overview of the field's latest advancements. Initially, FSL is formally defined, and its relationship with different learning fields is presented. A novel taxonomy is introduced, extending previously proposed ones, and real-world applications in classic and novel fields are described. Finally, recent trends shaping the field, outstanding challenges, and promising future research directions are discussed.
( 2
min )
We develop a new method HTBB for the multidimensional black-box approximation and gradient-free optimization, which is based on the low-rank hierarchical Tucker decomposition with the use of the MaxVol indices selection procedure. Numerical experiments for 14 complex model problems demonstrate the robustness of the proposed method for dimensions up to 1000, while it shows significantly more accurate results than classical gradient-free optimization methods, as well as approximation and optimization methods based on the popular tensor train decomposition, which represents a simpler case of a tensor network.
( 2
min )
We consider the problem of real-time reconstruction of urban air pollution maps. The task is challenging due to the heterogeneous sources of available data, the scarcity of direct measurements, the presence of noise, and the large surfaces that need to be considered. In this work, we introduce different reconstruction methods based on posing the problem on city graphs. Our strategies can be classified as fully data-driven, physics-driven, or hybrid, and we combine them with super-learning models. The performance of the methods is tested in the case of the inner city of Paris, France.
( 2
min )
This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.
( 2
min )
Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.
( 2
min )
Most existing federated learning (FL) methodologies have assumed training begins from a randomly initialized model. Recently, several studies have empirically demonstrated that leveraging a pre-trained model can offer advantageous initializations for FL. In this paper, we propose a collaborative pre-training approach, CoPreFL, which strategically designs a pre-trained model to serve as a good initialization for any downstream FL task. The key idea of our pre-training algorithm is a meta-learning procedure which mimics downstream distributed scenarios, enabling it to adapt to any unforeseen FL task. CoPreFL's pre-training optimization procedure also strikes a balance between average performance and fairness, with the aim of addressing these competing challenges in downstream FL tasks through intelligent initializations. Extensive experimental results validate that our pre-training method provides a robust initialization for any unseen downstream FL task, resulting in enhanced average performance and more equitable predictions.
( 2
min )
This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDP). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs with a general policy parametrization. To address this challenge, we propose a primal dual based policy gradient algorithm that adeptly manages the constraints while ensuring a low regret guarantee toward achieving a global optimal policy. In particular, we demonstrate that our proposed algorithm achieves $\tilde{\mathcal{O}}({T}^{3/4})$ objective regret and $\tilde{\mathcal{O}}({T}^{3/4})$ constraint violation bounds.
( 2
min )
Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.
( 2
min )
Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.
( 2
min )
This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.
( 2
min )
This post is co-written with Ilan Geller, Shuyu Yang and Richa Gupta from Accenture. Bringing innovative new pharmaceuticals drugs to market is a long and stringent process. Companies face complex regulations and extensive approval requirements from governing bodies like the US Food and Drug Administration (FDA). A key part of the submission process is authoring […]
( 7
min )
Do your employees wait for hours on the telephone to open an IT ticket? Do they wait for an agent to triage an issue, which sometimes only requires restarting the computer? Providing excellent IT support is crucial for any organization, but legacy systems have relied heavily on human agents being available to intake reports and […]
( 13
min )
In this post, we show how to develop an ML-driven solution using Amazon SageMaker for detecting adverse events using the publicly available Adverse Drug Reaction Dataset on Hugging Face. In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried.
( 8
min )
The graduate students will aim to commercialize innovations in AI, machine learning, and data science.
( 6
min )
Mr_Vudoo is a digital renaissance man — a livestreamer, video editor, gamer and entertainer skilled in producing an array of content for his audience.
( 9
min )
Recently, there has been a growing interest in mixed-categorical metamodels based on Gaussian Process (GP) for Bayesian optimization. In this context, different approaches can be used to build the mixed-categorical GP. Many of these approaches involve a high number of hyperparameters; in fact, the more general and precise the strategy used to build the GP, the greater the number of hyperparameters to estimate. This paper introduces an innovative dimension reduction algorithm that relies on partial least squares regression to reduce the number of hyperparameters used to build a mixed-variable GP. Our goal is to generalize classical dimension reduction techniques commonly used within GP (for continuous inputs) to handle mixed-categorical inputs. The good potential of the proposed method is demonstrated in both structural and multidisciplinary application contexts. The targeted applications include the analysis of a cantilever beam as well as the optimization of a green aircraft, resulting in a significant 439-kilogram reduction in fuel consumption during a single mission.
( 2
min )
It is well established that to ensure or certify the robustness of a neural network, its Lipschitz constant plays a prominent role. However, its calculation is NP-hard. In this note, by taking into account activation regions at each layer as new constraints, we propose new quadratically constrained MIP formulations for the neural network Lipschitz estimation problem. The solutions of these problems give lower bounds and upper bounds of the Lipschitz constant and we detail conditions when they coincide with the exact Lipschitz constant.
( 2
min )
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.
( 2
min )
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.
( 2
min )
In this work, we propose a model-agnostic instance-based post-hoc explainability method for time series classification. The proposed algorithm, namely Time-CF, leverages shapelets and TimeGAN to provide counterfactual explanations for arbitrary time series classifiers. We validate the proposed method on several real-world univariate time series classification tasks from the UCR Time Series Archive. The results indicate that the counterfactual instances generated by Time-CF when compared to state-of-the-art methods, demonstrate better performance in terms of four explainability metrics: closeness, sensibility, plausibility, and sparsity.
( 2
min )
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.
( 2
min )
For autonomous mobile robots, uncertainties in the environment and system model can lead to failure in the motion planning pipeline, resulting in potential collisions. In order to achieve a high level of robust autonomy, these robots should be able to proactively predict and recover from such failures. To this end, we propose a Gaussian Process (GP) based model for proactively detecting the risk of future motion planning failure. When this risk exceeds a certain threshold, a recovery behavior is triggered that leverages the same GP model to find a safe state from which the robot may continue towards the goal. The proposed approach is trained in simulation only and can generalize to real world environments on different robotic platforms. Simulations and physical experiments demonstrate that our framework is capable of both predicting planner failures and recovering the robot to states where planner success is likely, all while producing agile motion.
( 2
min )
In molecular dynamics (MD) simulations, rare events, such as protein folding, are typically studied by means of enhanced sampling techniques, most of which rely on the definition of a collective variable (CV) along which the acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. Leveraging interpolation progress parameters, we introduce a regression-based learning scheme for CV models, which outperforms classifier-based methods when transition state data is limited and noisy
( 2
min )
We seek to enable classic processing of continuous ultra-sparse spatiotemporal data generated by event-based sensors with dense machine learning models. We propose a novel hybrid pipeline composed of asynchronous sensing and synchronous processing that combines several ideas: (1) an embedding based on PointNet models -- the ALERT module -- that can continuously integrate new and dismiss old events thanks to a leakage mechanism, (2) a flexible readout of the embedded data that allows to feed any downstream model with always up-to-date features at any sampling rate, (3) exploiting the input sparsity in a patch-based approach inspired by Vision Transformer to optimize the efficiency of the method. These embeddings are then processed by a transformer model trained for object and gesture recognition. Using this approach, we achieve performances at the state-of-the-art with a lower latency than competitors. We also demonstrate that our asynchronous model can operate at any desired sampling rate.
( 2
min )
Recent work has described the presence of the embedding gap in neural network verification. On one side of the gap is a high-level specification about the network's behaviour, written by a domain expert in terms of the interpretable problem space. On the other side are a logically-equivalent set of satisfiability queries, expressed in the uninterpretable embedding space in a form suitable for neural network solvers. In this paper we describe an algorithm for compiling the former to the latter. We explore and overcome complications that arise from targeting neural network solvers as opposed to standard SMT solvers.
( 2
min )
Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc. However, their convergence analysis under non-convex conditions is challenging. In this work, we propose a unified framework to address this issue. For any first-order methods, we interpret the updated direction $g_t$ as the sum of the stochastic subgradient $\nabla f_t(x_t)$ and an additional acceleration term $\frac{2|\langle v_t, \nabla f_t(x_t) \rangle|}{\|v_t\|_2^2} v_t$, thus we can discuss the convergence by analyzing $\langle v_t, \nabla f_t(x_t) \rangle$. Through our framework, we have discovered two plug-and-play acceleration methods: \textbf{Reject Accelerating} and \textbf{Random Vector Accelerating}, we theoretically demonstrate that these two methods can directly lead to an improvement in convergence rate.
( 2
min )
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.
( 2
min )
Dose-Volume Histogram (DVH) prediction is fundamental in radiation therapy that facilitate treatment planning, dose evaluation, plan comparison and etc. It helps to increase the ability to deliver precise and effective radiation treatments while managing potential toxicities to healthy tissues as needed to reduce the risk of complications. This paper extends recently disclosed research findings presented on AAPM (AAPM 65th Annual Meeting $\&$ Exhibition) and includes necessary technique details. The objective is to design efficient deep learning models for DVH prediction on general radiotherapy platform equipped with high performance CBCT system, where input CT images and target dose images to predict may have different origins, spacing and sizes. Deep learning models widely-adopted in DVH prediction task are evaluated on the novel radiotherapy platform, and graph neural networks (GNNs) are shown to be the ideal architecture to construct a plug-and-play framework to improve predictive performance of base deep learning models in the adaptive setting.
( 2
min )
We make the case for neural network objects and extend an already existing neural network calculus explained in detail in Chapter 2 on \cite{bigbook}. Our aim will be to show that, yes, indeed, it makes sense to talk about neural network polynomials, neural network exponentials, sine, and cosines in the sense that they do indeed approximate their real number counterparts subject to limitations on certain of their parameters, $q$, and $\varepsilon$. While doing this, we show that the parameter and depth growth are only polynomial on their desired accuracy (defined as a 1-norm difference over $\mathbb{R}$), thereby showing that this approach to approximating, where a neural network in some sense has the structural properties of the function it is approximating is not entire intractable.
( 2
min )
This study introduces a two-scale Graph Neural Operator (GNO), namely, LatticeGraphNet (LGN), designed as a surrogate model for costly nonlinear finite-element simulations of three-dimensional latticed parts and structures. LGN has two networks: LGN-i, learning the reduced dynamics of lattices, and LGN-ii, learning the mapping from the reduced representation onto the tetrahedral mesh. LGN can predict deformation for arbitrary lattices, therefore the name operator. Our approach significantly reduces inference time while maintaining high accuracy for unseen simulations, establishing the use of GNOs as efficient surrogate models for evaluating mechanical responses of lattices and structures.
( 2
min )
Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service for building and deploying machine learning (ML) models without the need to write any code. Ready-to-use Foundation Models (FMs) available in SageMaker Canvas enable customers to use generative AI for tasks such as content generation and summarization. We are thrilled to announce the latest […]
( 6
min )
This is a guest post co-authored by Ajay K Gupta, Jean Felipe Teotonio and Paul A Churchyard from HSR.health. HSR.health is a geospatial health risk analytics firm whose vision is that global health challenges are solvable through human ingenuity and the focused and accurate application of data analytics. In this post, we present one approach […]
( 13
min )
AI is reshaping industries, society and the “very fabric of innovation” — and Canada is poised to play a key role in this global transformation, said NVIDIA founder and CEO Jensen Huang during a fireside chat with leaders from across Canada’s thriving AI ecosystem. “Canada, as you know, even though you’re so humble, you might
Read Article
( 6
min )
A new study underscores the potential of AI and accelerated computing to deliver energy efficiency and combat climate change, efforts in which NVIDIA has long been deeply engaged. The study, called “Rethinking Concerns About AI’s Energy Use,” provides a well-researched examination into how AI can — and in many cases already does — play a
Read Article
( 7
min )
Image by Cathrin2014 from Pixabay In July 2023, Teresa Tung, managing director and cloud-first chief technologist at Accenture, gave a Factory of the Future talk at the Databricks Data + AI Summit on digital twins, knowledge graphs, and generative AI for warehouse automation. Two points she made that resonated with me: 1) Digital twins are… Read More »Digital twins, interoperability and FAIR model-driven development
The post Digital twins, interoperability and FAIR model-driven development appeared first on Data Science Central.
( 22
min )
Let’s dive into the cloud, but not just any cloud—the cloud of the future, specifically the realm of cloud security in 2024. We’re not just talking about your everyday, run-of-the-mill updates here. We’re looking at the big players, the game changers, the trends that are going to set the stage for how we protect our… Read More »5 trends & advances that are set to define cloud security in 2024
The post 5 trends & advances that are set to define cloud security in 2024 appeared first on Data Science Central.
( 21
min )
Exploiting the symmetry within datasets, MIT researchers show, can decrease the amount of data needed for training neural networks.
( 7
min )
Dermatologists and general practitioners are somewhat less accurate in diagnosing disease in darker skin, a new study finds. Used correctly, AI may be able to help.
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). In the RAG pattern, we find pieces of reference content related to an input prompt by performing similarity searches on embeddings. Embeddings capture the information content in bodies of text, allowing natural language processing (NLP) models to work with […]
( 18
min )
In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.
( 2
min )
Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
( 2
min )
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
( 2
min )
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers.
( 2
min )
In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.
( 2
min )
We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for marginal-based private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.
( 2
min )
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers.
( 2
min )
Neutrinos can undergo fast flavor conversions (FFCs) within extremely dense astrophysical environments such as core-collapse supernovae (CCSNe) and neutron star mergers (NSMs). In this study, we explore FFCs in a \emph{multi-energy} neutrino gas, revealing that when the FFC growth rate significantly exceeds that of the vacuum Hamiltonian, all neutrinos (regardless of energy) share a common survival probability dictated by the energy-integrated neutrino spectrum. We then employ physics-informed neural networks (PINNs) to predict the asymptotic outcomes of FFCs within such a multi-energy neutrino gas. These predictions are based on the first two moments of neutrino angular distributions for each energy bin, typically available in state-of-the-art CCSN and NSM simulations. Our PINNs achieve errors as low as $\lesssim6\%$ and $\lesssim 18\%$ for predicting the number of neutrinos in the electron channel and the relative absolute error in the neutrino moments, respectively.
( 2
min )
Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
( 2
min )
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
( 2
min )
Resilience plays a pivotal role in the development of any workload, and generative AI workloads are no different. There are unique considerations when engineering generative AI workloads through a resilience lens. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. In this post, we discuss the […]
( 8
min )
Data is the foundation to capturing the maximum value from AI technology and solving business problems quickly. To unlock the potential of generative AI technologies, however, there’s a key prerequisite: your data needs to be appropriately prepared. In this post, we describe how use generative AI to update and scale your data pipeline using Amazon […]
( 6
min )
Embeddings play a key role in natural language processing (NLP) and machine learning (ML). Text embedding refers to the process of transforming text into numerical representations that reside in a high-dimensional vector space. This technique is achieved through the use of ML algorithms that enable the understanding of the meaning and context of data (semantic […]
( 9
min )
Image by Ahmad Ardity from Pixabay The good news is that the data science community is taking more of an interest in knowledge graphs lately. But unsurprisingly, some data science folks exploring graphs themselves are barely scratching the surface of knowledge graph potential. Until data scientists view the root problem to be solved through the… Read More »What data scientists overlook when it comes to knowledge graphs
The post What data scientists overlook when it comes to knowledge graphs appeared first on Data Science Central.
( 22
min )
GeForce NOW is celebrating its fourth anniversary all month — plus an extra day for leap year — during February’s GFN Thursdays, with 2 new games joining the cloud. Keep an eye out for more new games and other announcements for members to come. Diablo IV and Overwatch 2 heat up the cloud this GFN Read article >
( 7
min )
This study investigates self-supervised learning techniques to obtain representations of Event Sequences. It is a key modality in various applications, including but not limited to banking, e-commerce, and healthcare.
We perform a comprehensive study of generative and contrastive approaches in self-supervised learning, applying them both independently. We find that there is no single supreme method. Consequently, we explore the potential benefits of combining these approaches. To achieve this goal, we introduce a novel method that aligns generative and contrastive embeddings as distinct modalities, drawing inspiration from contemporary multimodal research.
Generative and contrastive approaches are often treated as mutually exclusive, leaving a gap for their combined exploration. Our results demonstrate that this aligned model performs at least on par with, and mostly surpasses, existing methods and is more universal across a variety of tasks. Furthermore, we demonstrate that self-supervised methods consistently outperform the supervised approach on our datasets.
( 2
min )
There are now many explainable AI methods for understanding the decisions of a machine learning model. Among these are those based on counterfactual reasoning, which involve simulating features changes and observing the impact on the prediction. This article proposes to view this simulation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown.
( 2
min )
This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service. An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.
( 2
min )
Prompt design and engineering has become an important discipline in just the past few months. In this paper, we provide an introduction to the main concepts and design approaches. We also provide more advanced techniques all the way to those needed to design LLM-based agents. We finish by providing a list of existing tools for prompt engineering.
( 2
min )
Quantum computing shows great potential, but errors pose a significant challenge. This study explores new strategies for mitigating quantum errors using artificial neural networks (ANN) and the Yang-Baxter equation (YBE). Unlike traditional error correction methods, which are computationally intensive, we investigate artificial error mitigation. The manuscript introduces the basics of quantum error sources and explores the potential of using classical computation for error mitigation. The Yang-Baxter equation plays a crucial role, allowing us to compress time dynamics simulations into constant-depth circuits. By introducing controlled noise through the YBE, we enhance the dataset for error mitigation. We train an ANN model on partial data from quantum simulations, demonstrating its effectiveness in correcting errors in time-evolving quantum states.
( 2
min )
Active learning strategies for 3D object detection in autonomous driving datasets may help to address challenges of data imbalance, redundancy, and high-dimensional data. We demonstrate the effectiveness of entropy querying to select informative samples, aiming to reduce annotation costs and improve model performance. We experiment using the BEVFusion model for 3D object detection on the nuScenes dataset, comparing active learning to random sampling and demonstrating that entropy querying outperforms in most cases. The method is particularly effective in reducing the performance gap between majority and minority classes. Class-specific analysis reveals efficient allocation of annotated resources for limited data budgets, emphasizing the importance of selecting diverse and informative data for model training. Our findings suggest that entropy querying is a promising strategy for selecting data that enhances model learning in resource-constrained environments.
( 2
min )
As legal case law databases such as HUDOC continue to grow rapidly, it has become essential for legal researchers to find efficient methods to handle such large-scale data sets. Such case law databases usually consist of the textual content of cases together with the citations between them. This paper focuses on case law from the European Court of Human Rights on Article 8 of the European Convention of Human Rights, the right to respect private and family life, home and correspondence. In this study, we demonstrate and compare the potential of topic modelling and citation network to find and organize case law on Article 8 based on their general themes and citation patterns, respectively. Additionally, we explore whether combining these two techniques leads to better results compared to the application of only one of the methods. We evaluate the effectiveness of the combined method on a unique manually collected and annotated dataset of Aricle 8 case law on evictions. The results of our experiments show that our combined (text and citation-based) approach provides the best results in finding and grouping case law, providing scholars with an effective way to extract and analyse relevant cases on a specific issue.
( 3
min )
This paper presents a comprehensive comparative analysis of explainable artificial intelligence (XAI) ensembling methods. Our research brings three significant contributions. Firstly, we introduce a novel ensembling method, NormEnsembleXAI, that leverages minimum, maximum, and average functions in conjunction with normalization techniques to enhance interpretability. Secondly, we offer insights into the strengths and weaknesses of XAI ensemble methods. Lastly, we provide a library, facilitating the practical implementation of XAI ensembling, thus promoting the adoption of transparent and interpretable deep learning models.
( 2
min )
This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and system can be verified independently, taking into account their black box character and the immanent stochastic properties of ML models and their training data. The article presents first results from a set of test experiments and suggest extensions to existing test methods reflecting the stochastic nature of ML models and ML-based systems.
( 2
min )
This paper studies Bayesian optimization with noise-free observations. We introduce new algorithms rooted in scattered data approximation that rely on a random exploration step to ensure that the fill-distance of query points decays at a near-optimal rate. Our algorithms retain the ease of implementation of the classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly match those conjectured in arXiv:2002.05096, hence solving a COLT open problem. Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian optimization strategies in several examples.
( 2
min )
Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients.
( 2
min )
World is looking for clean and renewable energy sources that do not pollute the environment, in an attempt to reduce greenhouse gas emissions that contribute to global warming. Wind energy has significant potential to not only reduce greenhouse emission, but also meet the ever increasing demand for energy. To enable the effective utilization of wind energy, addressing the following three challenges in wind data analysis is crucial. Firstly, improving data resolution in various climate conditions to ensure an ample supply of information for assessing potential energy resources. Secondly, implementing dimensionality reduction techniques for data collected from sensors/simulations to efficiently manage and store large datasets. Thirdly, extrapolating wind data from one spatial specification to another, particularly in cases where data acquisition may be impractical or costly. We propose a deep learning based approach to achieve multi-modal continuous resolution wind data prediction from discontinuous wind data, along with data dimensionality reduction.
( 2
min )
Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information would take a significant overhead and their estimated values might not be accurate. This problem is even more severe in cell-free networks as there are many of these parameters to be acquired. Therefore, this paper sets out to investigate the activity detection problem without the above-mentioned information. In order to handle so many unknown parameters, this paper employs the Bayesian approach, where the unknown variables are endowed with prior distributions which effectively act as regularizations. Together with the likelihood function, a maximum a posteriori (MAP) estimator and a variational inference algorithm are derived. Extensive simulations demonstrate that the proposed methods, even without the knowledge of these system parameters, perform better than existing state-of-the-art methods, such as covariance-based and approximate message passing methods.
( 2
min )
Deep Neural Network (DNN) models when implemented on executing devices as the inference engines are susceptible to Fault Injection Attacks (FIAs) that manipulate model parameters to disrupt inference execution with disastrous performance. This work introduces Contrastive Learning (CL) of visual representations i.e., a self-supervised learning approach into the deep learning training and inference pipeline to implement DNN inference engines with self-resilience under FIAs. Our proposed CL based FIA Detection and Recovery (CFDR) framework features (i) real-time detection with only a single batch of testing data and (ii) fast recovery effective even with only a small amount of unlabeled testing data. Evaluated with the CIFAR-10 dataset on multiple types of FIAs, our CFDR shows promising detection and recovery effectiveness.
( 2
min )
This paper introduces the multivariate beta mixture model (MBMM), a new probabilistic model for soft clustering. MBMM adapts to diverse cluster shapes because of the flexible probability density function of the multivariate beta distribution. We introduce the properties of MBMM, describe the parameter learning procedure, and present the experimental results, showing that MBMM fits diverse cluster shapes on synthetic and real datasets. The code is released anonymously at \url{https://github.com/hhchen1105/mbmm/}.
( 2
min )
This work undertakes studies to evaluate Interpretability Methods for Time-Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work answers three research questions: 1) Do different sensitivity analysis (SA) methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning (DL) models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth?
( 2
min )
This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teaming approach is proposed to combine models based on first principles with AI. The resulting human-informed machine learning method, denoted as AFSD-Physics, can effectively learn the governing equations of temperature evolution at the tool and the build from in-process measurements. Experiments are designed and conducted to collect in-process measurements for the deposition of aluminum 7075 with a total of 30 layers. The acquired governing equations are physically interpretable models with low computational cost and high accuracy. Model predictions show good agreement with the measurements. Experimental validation with new process parameters demonstrates the model's generalizability and potential for use in tool temperature control and process optimization.
( 2
min )
This paper studies Bayesian optimization with noise-free observations. We introduce new algorithms rooted in scattered data approximation that rely on a random exploration step to ensure that the fill-distance of query points decays at a near-optimal rate. Our algorithms retain the ease of implementation of the classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly match those conjectured in arXiv:2002.05096, hence solving a COLT open problem. Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian optimization strategies in several examples.
( 2
min )
A common forecasting setting in real world applications considers a set of possibly heterogeneous time series of the same domain. Due to different properties of each time series such as length, obtaining forecasts for each individual time series in a straight-forward way is challenging. This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion, and improve forecasts of possibly simple models by averaging. Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting. Additionally, diagnostics tools are proposed allowing a deep understanding of the procedure.
( 2
min )
Microsoft Research Forum (opens in new tab) is a new series of conversations that explore recent advances, bold new ideas, and important discussions within the global research community. Leading Microsoft researchers will share insights into their work, followed by live online discussions with audience participants. This post provides an overview of the inaugural Microsoft Research […]
The post Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI appeared first on Microsoft Research.
( 11
min )
In this post, we show you how to securely create a movie chatbot by implementing RAG with your own data using Knowledge Bases for Amazon Bedrock. We use the IMDb and Box Office Mojo dataset to simulate a catalog for media and entertainment customers and showcase how you can build your own RAG solution in just a couple of steps.
( 7
min )
This post was co-written with Ricardo Perdigao, Solution Architecture Manager at Mendix, a Siemens business. Mendix, a Siemens business, offers the low-code platform with the vision and execution designed for today’s complex software development challenges. Since 2005, we’ve helped thousands of organizations worldwide reimagine how they develop applications with our platform’s cutting-edge capabilities. Mendix allows […]
( 8
min )
In the first part of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. In this post, we present an approach to develop a deep learning-based computer vision model to […]
( 13
min )
Data governance is more important than ever in e-commerce, where massive amounts of data are generated and processed daily. Big Data presents opportunities and challenges for e-commerce businesses, requiring a strategic approach to data quality, security, and compliance. This article discusses e-commerce data governance best practices, including understanding data governance, data quality, data security, compliance… Read More »Mastering E-commerce data governance: Best practices, challenges, and future trends for quality, compliance, and growth
The post Mastering E-commerce data governance: Best practices, challenges, and future trends for quality, compliance, and growth appeared first on Data Science Central.
( 27
min )
Here’s some news to still beating hearts: AI is helping bring some clarity to cardiology. Caristo Diagnostics has developed an AI-powered solution for detecting coronary inflammation in cardiac CT scans. In this episode of NVIDIA’s AI Podcast, Dr. Keith Channon, the Field Marshal Earl Alexander Professor at the University of Oxford, and the cofounder and Read article >
( 5
min )
Asia’s lion city is roaring ahead in AI. Singtel, a leading communications services provider based in Singapore, will bring the NVIDIA AI platform to businesses in the island nation and beyond. The mobile and broadband company is building energy-efficient data centers across Southeast Asia accelerated with NVIDIA Hopper architecture GPUs and using NVIDIA AI reference Read article >
( 6
min )
We’re developing a blueprint for evaluating the risk that a large language model (LLM) could aid someone in creating a biological threat. In an evaluation involving both biology experts and students, we found that GPT-4 provides at most a mild uplift in biological threat creation accuracy. While this uplift is not large enough to be conclusive, our finding is a starting point for continued research and community deliberation.
( 20
min )
Recent advancements in biological research leverage the integration of
molecules, proteins, and natural language to enhance drug discovery. However,
current models exhibit several limitations, such as the generation of invalid
molecular SMILES, underutilization of contextual information, and equal
treatment of structured and unstructured knowledge. To address these issues, we
propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches
cross-modal integration in biology with chemical knowledge and natural language
associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular
representations and extracts knowledge from the surrounding context of
bio-entities in unstructured biological literature. Furthermore,
$\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge,
leading to more effective utilization of information. After fine-tuning, BioT5
shows superior performance across a wide range of tasks, demonstrating its
strong capability of capturing underlying relations and properties of
bio-entities. Our code is available at
$\href{https://github.com/QizhiPei/BioT5}{Github}$.
( 2
min )
Rapid advancements in artificial intelligence (AI) technology have brought
about a plethora of new challenges in terms of governance and regulation. AI
systems are being integrated into various industries and sectors, creating a
demand from decision-makers to possess a comprehensive and nuanced
understanding of the capabilities and limitations of these systems. One
critical aspect of this demand is the ability to explain the results of machine
learning models, which is crucial to promoting transparency and trust in AI
systems, as well as fundamental in helping machine learning models to be
trained ethically. In this paper, we present novel metrics to quantify the
degree of which AI model predictions can be easily explainable by its features.
Our metrics summarize different aspects of explainability into scalars,
providing a more comprehensive understanding of model predictions and
facilitating communication between decision-makers and stakeholders, thereby
increasing the overall transparency and accountability of AI systems.
( 2
min )
Besides training, mathematical optimization is also used in deep learning to
model and solve formulations over trained neural networks for purposes such as
verification, compression, and optimization with learned constraints. However,
solving these formulations soon becomes difficult as the network size grows due
to the weak linear relaxation and dense constraint matrix. We have seen
improvements in recent years with cutting plane algorithms, reformulations, and
an heuristic based on Mixed-Integer Linear Programming (MILP). In this work, we
propose a more scalable heuristic based on exploring global and local linear
relaxations of the neural network model. Our heuristic is competitive with a
state-of-the-art MILP solver and the prior heuristic while producing better
solutions with increases in input, depth, and number of neurons.
( 2
min )
Recently, Deep Convolutional Neural Networks (DCNNs) including the ResNet-20
architecture have been privately evaluated on encrypted, low-resolution data
with the Residue-Number-System Cheon-Kim-Kim-Song (RNS-CKKS) homomorphic
encryption scheme. We extend methods for evaluating DCNNs on images with larger
dimensions and many channels, beyond what can be stored in single ciphertexts.
Additionally, we simplify and improve the efficiency of the recently introduced
multiplexed image format, demonstrating that homomorphic evaluation can work
with standard, row-major matrix packing and results in encrypted inference time
speedups by $4.6-6.5\times$. We also show how existing DCNN models can be
regularized during the training process to further improve efficiency and
accuracy. These techniques are applied to homomorphically evaluate a DCNN with
high accuracy on the high-resolution ImageNet dataset, achieving $80.2\%$ top-1
accuracy. We also achieve an accuracy of homomorphically evaluated CNNs on the
CIFAR-10 dataset of $98.3\%$.
( 2
min )
In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski
type to a local anisotropic perimeter. The nonlocal model describes the
regularizing effect of adversarial training in binary classifications. The
energy essentially depends on the interaction between two distributions
modelling likelihoods for the associated classes. We overcome typical strict
regularity assumptions for the distributions by only assuming that they have
bounded $BV$ densities. In the natural topology coming from compactness, we
prove Gamma-convergence to a weighted perimeter with weight determined by an
anisotropic function of the two densities. Despite being local, this sharp
interface limit reflects classification stability with respect to adversarial
perturbations. We further apply our results to deduce Gamma-convergence of the
associated total variations, to study the asymptotics of adversarial training,
and to prove Gamma-convergence of graph discretizations for the nonlocal
perimeter.
( 2
min )
We introduce NeuroSynt, a neuro-symbolic portfolio solver framework for
reactive synthesis. At the core of the solver lies a seamless integration of
neural and symbolic approaches to solving the reactive synthesis problem. To
ensure soundness, the neural engine is coupled with model checkers verifying
the predictions of the underlying neural models. The open-source implementation
of NeuroSynt provides an integration framework for reactive synthesis in which
new neural and state-of-the-art symbolic approaches can be seamlessly
integrated. Extensive experiments demonstrate its efficacy in handling
challenging specifications, enhancing the state-of-the-art reactive synthesis
solvers, with NeuroSynt contributing novel solves in the current SYNTCOMP
benchmarks.
( 2
min )
Federated Learning (FL) is a machine learning approach that addresses privacy
and data transfer costs by computing data at the source. It's particularly
popular for Edge and IoT applications where the aggregator server of FL is in
resource-capped edge data centers for reducing communication costs. Existing
cloud-based aggregator solutions are resource-inefficient and expensive at the
Edge, leading to low scalability and high latency. To address these challenges,
this study compares prior and new aggregation methodologies under the changing
demands of IoT and Edge applications. This work is the first to propose an
adaptive FL aggregator at the Edge, enabling users to manage the cost and
efficiency trade-off. An extensive comparative analysis demonstrates that the
design improves scalability by up to 4X, time efficiency by 8X, and reduces
costs by more than 2X compared to extant cloud-based static methodologies.
( 2
min )
We introduce the higher-order refactoring problem, where the goal is to
compress a logic program by discovering higher-order abstractions, such as map,
filter, and fold. We implement our approach in Stevie, which formulates the
refactoring problem as a constraint optimisation problem. Our experiments on
multiple domains, including program synthesis and visual reasoning, show that
refactoring can improve the learning performance of an inductive logic
programming system, specifically improving predictive accuracies by 27% and
reducing learning times by 47%. We also show that Stevie can discover
abstractions that transfer to multiple domains.
( 2
min )
Large language models (LLMs) such as GPT-3.5 and CodeLlama are powerful
models for code generation and understanding. Fine-tuning these models comes
with a high computational cost and requires a large labeled dataset.
Alternatively, in-context learning techniques allow models to learn downstream
tasks with only a few examples. Recently, researchers have shown how in-context
learning performs well in bug detection and repair. In this paper, we propose
code-pair classification task in which both the buggy and non-buggy versions
are given to the model, and the model identifies the buggy ones. We evaluate
our task in real-world dataset of bug detection and two most powerful LLMs. Our
experiments indicate that an LLM can often pick the buggy from the non-buggy
version of the code, and the code-pair classification task is much easier
compared to be given a snippet and deciding if and where a bug exists.
( 2
min )
Premise selection is a fundamental problem of automated theorem proving.
Previous works often use intricate symbolic methods, rely on domain knowledge,
and require significant engineering effort to solve this task. In this work, we
show that Magnushammer, a neural transformer-based approach, can outperform
traditional symbolic systems by a large margin. Tested on the PISA benchmark,
Magnushammer achieves $59.5\%$ proof rate compared to a $38.3\%$ proof rate of
Sledgehammer, the most mature and popular symbolic-based solver. Furthermore,
by combining Magnushammer with a neural formal prover based on a language
model, we significantly improve the previous state-of-the-art proof rate from
$57.0\%$ to $71.0\%$.
( 2
min )
Graph Neural Networks are notorious for its memory consumption. A recent
Transformer-based GNN called Graph Transformer is shown to obtain superior
performances when long range dependencies exist. However, combining graph data
and Transformer architecture led to a combinationally worse memory issue. We
propose a novel version of "edge regularization technique" that alleviates the
need for Positional Encoding and ultimately alleviate GT's out of memory issue.
We observe that it is not clear whether having an edge regularization on top of
positional encoding is helpful. However, it seems evident that applying our
edge regularization technique indeed stably improves GT's performance compared
to GT without Positional Encoding.
( 2
min )
Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient
fine-tuning of large language models (LLMs). However, fine-tuned LLMs often
become overconfident especially when fine-tuned on small datasets. Bayesian
methods, with their inherent ability to estimate uncertainty, serve as potent
tools to mitigate overconfidence and enhance calibration. In this work, we
introduce Laplace-LoRA, which applies a Bayesian approach to the LoRA
parameters. Specifically, Laplace-LoRA applies a Laplace approximation to the
posterior over the LoRA parameters, considerably improving the calibration of
fine-tuned LLMs.
( 2
min )
Long-term fetal heart rate (FHR) monitoring during the antepartum period,
increasingly popularized by electronic FHR monitoring, represents a growing
approach in FHR monitoring. This kind of continuous monitoring, in contrast to
the short-term one, collects an extended period of fetal heart data. This
offers a more comprehensive understanding of fetus's conditions. However, the
interpretation of long-term antenatal fetal heart monitoring is still in its
early stages, lacking corresponding clinical standards. Furthermore, the
substantial amount of data generated by continuous monitoring imposes a
significant burden on clinical work when analyzed manually. To address above
challenges, this study develops an automatic analysis system named LARA
(Long-term Antepartum Risk Analysis system) for continuous FHR monitoring,
combining deep learning and information fusion methods. LARA's core is a
well-established convolutional neural network (CNN) model. It processes
long-term FHR data as input and generates a Risk Distribution Map (RDM) and
Risk Index (RI) as the analysis results. We evaluate LARA on inner test
dataset, the performance metrics are as follows: AUC 0.872, accuracy 0.816,
specificity 0.811, sensitivity 0.806, precision 0.271, and F1 score 0.415. In
our study, we observe that long-term FHR monitoring data with higher RI is more
likely to result in adverse outcomes (p=0.0021). In conclusion, this study
introduces LARA, the first automated analysis system for long-term FHR
monitoring, initiating the further explorations into its clinical value in the
future.
( 3
min )
We study the problem of in-context learning (ICL) with large language models
(LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak
or regurgitate the private examples demonstrated in the prompt. We propose a
novel algorithm that generates synthetic few-shot demonstrations from the
private dataset with formal differential privacy (DP) guarantees, and show
empirically that it can achieve effective ICL. We conduct extensive experiments
on standard benchmarks and compare our algorithm with non-private ICL and
zero-shot solutions. Our results demonstrate that our algorithm can achieve
competitive performance with strong privacy levels. These results open up new
possibilities for ICL with privacy protection for a broad range of
applications.
( 2
min )
In the rapidly evolving field of machine learning, adversarial attacks
present a significant challenge to model robustness and security.
Decision-based attacks, which only require feedback on the decision of a model
rather than detailed probabilities or scores, are particularly insidious and
difficult to defend against. This work introduces L-AutoDA (Large Language
Model-based Automated Decision-based Adversarial Attacks), a novel approach
leveraging the generative capabilities of Large Language Models (LLMs) to
automate the design of these attacks. By iteratively interacting with LLMs in
an evolutionary framework, L-AutoDA automatically designs competitive attack
algorithms efficiently without much human effort. We demonstrate the efficacy
of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline
methods in both success rate and computational efficiency. Our findings
underscore the potential of language models as tools for adversarial attack
generation and highlight new avenues for the development of robust AI systems.
( 2
min )
This paper reports on the design and results of the 2024 ICASSP SP Cadenza
Challenge: Music Demixing/Remixing for Hearing Aids. The Cadenza project is
working to enhance the audio quality of music for those with a hearing loss.
The scenario for the challenge was listening to stereo reproduction over
loudspeakers via hearing aids. The task was to: decompose pop/rock music into
vocal, drums, bass and other (VDBO); rebalance the different tracks with
specified gains and then remixing back to stereo. End-to-end approaches were
also accepted. 17 systems were submitted by 11 teams. Causal systems performed
poorer than non-causal approaches. 9 systems beat the baseline. A common
approach was to fine-tuning pretrained demixing models. The best approach used
an ensemble of models.
( 2
min )
We study a regularized interacting particle method for computing aggregation
patterns and near singular solutions of a Keller-Segal (KS) chemotaxis system
in two and three space dimensions, then further develop DeepParticle (DP)
method to learn and generate solutions under variations of physical parameters.
The KS solutions are approximated as empirical measures of particles which
self-adapt to the high gradient part of solutions. We utilize the
expressiveness of deep neural networks (DNNs) to represent the transform of
samples from a given initial (source) distribution to a target distribution at
finite time T prior to blowup without assuming invertibility of the transforms.
In the training stage, we update the network weights by minimizing a discrete
2-Wasserstein distance between the input and target empirical measures. To
reduce computational cost, we develop an iterative divide-and-conquer algorithm
to find the optimal transition matrix in the Wasserstein distance. We present
numerical results of DP framework for successful learning and generation of KS
dynamics in the presence of laminar and chaotic flows. The physical parameter
in this work is either the small diffusivity of chemo-attractant or the
reciprocal of the flow amplitude in the advection-dominated regime.
( 2
min )
Within cardiovascular disease detection using deep learning applied to ECG
signals, the complexities of handling physiological signals have sparked
growing interest in leveraging deep generative models for effective data
augmentation. In this paper, we introduce a novel versatile approach based on
denoising diffusion probabilistic models for ECG synthesis, addressing three
scenarios: (i) heartbeat generation, (ii) partial signal imputation, and (iii)
full heartbeat forecasting. Our approach presents the first generalized
conditional approach for ECG synthesis, and our experimental results
demonstrate its effectiveness for various ECG-related tasks. Moreover, we show
that our approach outperforms other state-of-the-art ECG generative models and
can enhance the performance of state-of-the-art classifiers.
( 2
min )
In recent years, there has been a noticeable increase in cyberattacks using
ransomware. Attackers use this malicious software to break into networks and
harm computer systems. This has caused significant and lasting damage to
various organizations, including government, private companies, and regular
users. These attacks often lead to the loss or exposure of sensitive
information, disruptions in normal operations, and persistent vulnerabilities.
This paper focuses on a method for recognizing and identifying ransomware in
computer networks. The approach relies on using machine learning algorithms and
analyzing the patterns of network traffic. By collecting and studying this
traffic, and then applying machine learning models, we can accurately identify
and detect ransomware. The results of implementing this method show that
machine learning algorithms can effectively pinpoint ransomware based on
network traffic, achieving high levels of precision and accuracy.
( 2
min )
In-context learning (ICL) suffers from oversensitivity to the prompt, making
it unreliable in real-world scenarios. We study the sensitivity of ICL with
respect to multiple perturbation types. First, we find that label bias obscures
the true sensitivity, and therefore prior work may have significantly
underestimated ICL sensitivity. Second, we observe a strong negative
correlation between ICL sensitivity and accuracy: predictions sensitive to
perturbations are less likely to be correct. Motivated by these findings, we
propose \textsc{SenSel}, a few-shot selective prediction method that abstains
from sensitive predictions. Experiments on ten classification datasets show
that \textsc{SenSel} consistently outperforms two commonly used
confidence-based and entropy-based baselines on abstention decisions.
( 2
min )
Existing analyses of the expressive capacity of Transformer models have
required excessively deep layers for data memorization, leading to a
discrepancy with the Transformers actually used in practice. This is primarily
due to the interpretation of the softmax function as an approximation of the
hardmax function. By clarifying the connection between the softmax function and
the Boltzmann operator, we prove that a single layer of self-attention with
low-rank weight matrices possesses the capability to perfectly capture the
context of an entire input sequence. As a consequence, we show that one-layer
and single-head Transformers have a memorization capacity for finite samples,
and that Transformers consisting of one self-attention layer with two
feed-forward neural networks are universal approximators for continuous
permutation equivariant functions on a compact domain.
( 2
min )
We propose a convex signal reconstruction method for block sparsity under
arbitrary linear transform with unknown block structure. The proposed method is
a generalization of the existing method LOP-$\ell_2$/$\ell_1$ and can
reconstruct signals with block sparsity under non-invertible transforms, unlike
LOP-$\ell_2$/$\ell_1$. Our work broadens the scope of block sparse
regularization, enabling more versatile and powerful applications across
various signal processing domains. We derive an iterative algorithm for solving
proposed method and provide conditions for its convergence to the optimal
solution. Numerical experiments demonstrate the effectiveness of the proposed
method.
( 2
min )
Electronic health record (EHR) is more and more popular, and it comes with
applying machine learning solutions to resolve various problems in the domain.
This growing research area also raises the need for EHRs accessibility. Medical
Information Mart for Intensive Care (MIMIC) dataset is a popular, public, and
free EHR dataset in a raw format that has been used in numerous studies.
However, despite of its popularity, it is lacking benchmarking work, especially
with recent state of the art works in the field of deep learning with
time-series tabular data. The aim of this work is to fill this lack by
providing a benchmark for latest version of MIMIC dataset, MIMIC-IV. We also
give a detailed literature survey about studies that has been already done for
MIIMIC-III.
( 2
min )
Federated Learning (FL) enables collaborative model training among medical
centers without sharing private data. However, traditional FL risks on server
failures and suboptimal performance on local data due to the nature of
centralized model aggregation. To address these issues, we present Gossip
Mutual Learning (GML), a decentralized framework that uses Gossip Protocol for
direct peer-to-peer communication. In addition, GML encourages each site to
optimize its local model through mutual learning to account for data variations
among different sites. For the task of tumor segmentation using 146 cases from
four clinical sites in BraTS 2021 dataset, we demonstrated GML outperformed
local models and achieved similar performance as FedAvg with only 25%
communication overhead.
( 2
min )
The emergence of novel the dummy data injection attack (DDIA) poses a severe
threat to the secure and stable operation of power systems. These attacks are
particularly perilous due to the minimal Euclidean spatial separation between
the injected malicious data and legitimate data, rendering their precise
detection challenging using conventional distance-based methods. Furthermore,
existing research predominantly focuses on various machine learning techniques,
often analyzing the temporal data sequences post-attack or relying solely on
Euclidean spatial characteristics. Unfortunately, this approach tends to
overlook the inherent topological correlations within the non-Euclidean spatial
attributes of power grid data, consequently leading to diminished accuracy in
attack localization. To address this issue, this study takes a comprehensive
approach. Initially, it examines the underlying principles of these new DDIAs
on power systems. Here, an intricate mathematical model of the DDIA is
designed, accounting for incomplete topological knowledge and alternating
current (AC) state estimation from an attacker's perspective. Subsequently, by
integrating a priori knowledge of grid topology and considering the temporal
correlations within measurement data and the topology-dependent attributes of
the power grid, this study introduces temporal and spatial attention matrices.
These matrices adaptively capture the spatio-temporal correlations within the
attacks. Leveraging gated stacked causal convolution and graph wavelet sparse
convolution, the study jointly extracts spatio-temporal DDIA features. Finally,
the research proposes a DDIA localization method based on spatio-temporal graph
neural networks. The accuracy and effectiveness of the DDIA model are
rigorously demonstrated through comprehensive analytical cases.
( 3
min )
Object detection in reduced visibility has become a prominent research area.
The existing techniques are not accurate enough in recognizing objects under
such circumstances. This paper introduces a new foggy object detection method
through a two-staged architecture of region identification from input images
and detecting objects in such regions. The paper confirms notable improvements
of the proposed method's accuracy and detection time over existing techniques.
( 2
min )
Experiments at the High-Luminosity LHC and the Future Circular Collider need
efficient algorithms to reconstruct granular events expected at such detectors
with high fidelity. We study scalable machine learning models for event
reconstruction in electron-positron collisions based on a full detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters. We compare a graph neural
network and kernel-based transformer and demonstrate that we can avoid
quadratic operations while achieving realistic reconstruction. We show that
hyperparameter tuning significantly improves the performance of the models. The
best graph neural network model shows improvement in the jet transverse
momentum resolution by up to 50% compared to the rule-based algorithm. Accurate
reconstruction can significantly improve future measurements at colliders. The
resulting model is portable across Nvidia, AMD and Habana hardware. Our
datasets and software are published following the findable, accessible,
interoperable, and reusable principles.
( 3
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its behavior in three distinct settings -- offline, online, and hybrid -- and
propose efficient algorithms with finite-sample theoretical guarantees.
Moving towards practical applications, our framework, with a robust
approximation of the information-theoretical policy improvement oracle,
naturally gives rise to several novel RLHF algorithms. This includes an
iterative version of the Direct Preference Optimization (DPO) algorithm for
online settings, and a multi-step rejection sampling strategy for offline
scenarios. Our empirical evaluations on real-world alignment experiment of
large language model demonstrate that these proposed methods significantly
surpass existing strong baselines, such as DPO and Rejection Sampling
Optimization (RSO), showcasing the connections between solid theoretical
foundations and their powerful practical implementations.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
In this work, we study a natural nonparametric estimator of the transition
probability matrices of a finite controlled Markov chain. We consider an
offline setting with a fixed dataset, collected using a so-called logging
policy. We develop sample complexity bounds for the estimator and establish
conditions for minimaxity. Our statistical bounds depend on the logging policy
through its mixing properties. We show that achieving a particular statistical
risk bound involves a subtle and interesting trade-off between the strength of
the mixing properties and the number of samples. We demonstrate the validity of
our results under various examples, such as ergodic Markov chains, weakly
ergodic inhomogeneous Markov chains, and controlled Markov chains with
non-stationary Markov, episodic, and greedy controls. Lastly, we use these
sample complexity bounds to establish concomitant ones for offline evaluation
of stationary Markov control policies.
( 2
min )
Transfer learning plays a key role in modern data analysis when: (1) the
target data are scarce but the source data are sufficient; (2) the
distributions of the source and target data are heterogeneous. This paper
develops an interpretable unified transfer learning model, termed as UTrans,
which can detect both transferable variables and source data. More
specifically, we establish the estimation error bounds and prove that our
bounds are lower than those with target data only. Besides, we propose a source
detection algorithm based on hypothesis testing to exclude the nontransferable
data. We evaluate and compare UTrans to the existing algorithms in multiple
experiments. It is shown that UTrans attains much lower estimation and
prediction errors than the existing methods, while preserving interpretability.
We finally apply it to the US intergenerational mobility data and compare our
proposed algorithms to the classical machine learning algorithms.
( 2
min )
We study policy optimization algorithms for computing correlated equilibria
in multi-player general-sum Markov Games. Previous results achieve
$O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated
$O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated
equilibrium. In this paper, we improve both results significantly by providing
an uncoupled policy optimization algorithm that attains a near-optimal
$\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium.
Our algorithm is constructed by combining two main elements (i) smooth value
updates and (ii) the optimistic-follow-the-regularized-leader algorithm with
the log barrier regularizer.
( 2
min )
Interpreting deep learning time series models is crucial in understanding the
model's behavior and learning patterns from raw data for real-time
decision-making. However, the complexity inherent in transformer-based time
series models poses challenges in explaining the impact of individual features
on predictions. In this study, we leverage recent local interpretation methods
to interpret state-of-the-art time series models. To use real-world datasets,
we collected three years of daily case data for 3,142 US counties. Firstly, we
compare six transformer-based models and choose the best prediction model for
COVID-19 infection. Using 13 input features from the last two weeks, we can
predict the cases for the next two weeks. Secondly, we present an innovative
way to evaluate the prediction sensitivity to 8 population age groups over
highly dynamic multivariate infection data. Thirdly, we compare our proposed
perturbation-based interpretation method with related work, including a total
of eight local interpretation methods. Finally, we apply our framework to
traffic and electricity datasets, demonstrating that our approach is generic
and can be applied to other time-series domains.
( 3
min )
This paper presents a plugin that adds a representation of homogeneous and
heterogeneous, optically thick, translucent materials on the Blender 3D
modeling tool. The working principle of this plugin is based on a combination
of Genetic Algorithm (GA) and Singular Value Decomposition (SVD)-based
subsurface scattering method (GenSSS). The proposed plugin has been implemented
using Mitsuba renderer, which is an open source rendering software. The
proposed plugin has been validated on measured subsurface scattering data. It's
shown that the proposed plugin visualizes homogeneous and heterogeneous
subsurface scattering effects, accurately, compactly and computationally
efficiently.
( 2
min )
This paper presents a description of a real-world, multivariate time series
dataset collected from an anonymized engine component (called Component X) of a
fleet of trucks from SCANIA, Sweden. This dataset includes diverse variables
capturing detailed operational data, repair records, and specifications of
trucks while maintaining confidentiality by anonymization. It is well-suited
for a range of machine learning applications, such as classification,
regression, survival analysis, and anomaly detection, particularly when applied
to predictive maintenance scenarios. The large population size and variety of
features in the format of histograms and numerical counters, along with the
inclusion of temporal information, make this real-world dataset unique in the
field. The objective of releasing this dataset is to give a broad range of
researchers the possibility of working with real-world data from an
internationally well-known company and introduce a standard benchmark to the
predictive maintenance field, fostering reproducible research.
( 2
min )
Background: The semantics of entities extracted from a clinical text can be
dramatically altered by modifiers, including entity negation, uncertainty,
conditionality, severity, and subject. Existing models for determining
modifiers of clinical entities involve regular expression or features weights
that are trained independently for each modifier.
Methods: We develop and evaluate a multi-task transformer architecture design
where modifiers are learned and predicted jointly using the publicly available
SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that
contains modifiers shared with SemEval as well as novel modifiers specific for
OUD. We evaluate the effectiveness of our multi-task learning approach versus
previously published systems and assess the feasibility of transfer learning
for clinical entity modifiers when only a portion of clinical modifiers are
shared.
Results: Our approach achieved state-of-the-art results on the ShARe corpus
from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy,
1.7% on unweighted accuracy, and 10% on micro F1 scores.
Conclusions: We show that learned weights from our shared model can be
effectively transferred to a new partially matched data set, validating the use
of transfer learning for clinical text modifiers
( 3
min )
Pre-training is known to generate universal representations for downstream
tasks in large-scale deep learning such as large language models. Existing
literature, e.g., \cite{kim2020adversarial}, empirically observe that the
downstream tasks can inherit the adversarial robustness of the pre-trained
model. We provide theoretical justifications for this robustness inheritance
phenomenon. Our theoretical results reveal that feature purification plays an
important role in connecting the adversarial robustness of the pre-trained
model and the downstream tasks in two-layer neural networks. Specifically, we
show that (i) with adversarial training, each hidden node tends to pick only
one (or a few) feature; (ii) without adversarial training, the hidden nodes can
be vulnerable to attacks. This observation is valid for both supervised
pre-training and contrastive learning. With purified nodes, it turns out that
clean training is enough to achieve adversarial robustness in downstream tasks.
( 2
min )
We explore a novel methodology for constructing confidence regions for
parameters of linear models, using predictions from any arbitrary predictor.
Our framework requires minimal assumptions on the noise and can be extended to
functions deviating from strict linearity up to some adjustable threshold,
thereby accommodating a comprehensive and pragmatically relevant set of
functions. The derived confidence regions can be cast as constraints within a
Mixed Integer Linear Programming framework, enabling optimisation of linear
objectives. This representation enables robust optimization and the extraction
of confidence intervals for specific parameter coordinates. Unlike previous
methods, the confidence region can be empty, which can be used for hypothesis
testing. Finally, we validate the empirical applicability of our method on
synthetic data.
( 2
min )
This paper studies the estimation and inference of treatment histories in
panel data settings when treatments change dynamically over time.
We propose a method that allows for (i) treatments to be assigned dynamically
over time based on high-dimensional covariates, past outcomes and treatments;
(ii) outcomes and time-varying covariates to depend on treatment trajectories;
(iii) heterogeneity of treatment effects.
Our approach recursively projects potential outcomes' expectations on past
histories. It then controls the bias by balancing dynamically observable
characteristics. We study the asymptotic and numerical properties of the
estimator and illustrate the benefits of the procedure in an empirical
application.
( 2
min )
In this work, we study a natural nonparametric estimator of the transition
probability matrices of a finite controlled Markov chain. We consider an
offline setting with a fixed dataset, collected using a so-called logging
policy. We develop sample complexity bounds for the estimator and establish
conditions for minimaxity. Our statistical bounds depend on the logging policy
through its mixing properties. We show that achieving a particular statistical
risk bound involves a subtle and interesting trade-off between the strength of
the mixing properties and the number of samples. We demonstrate the validity of
our results under various examples, such as ergodic Markov chains, weakly
ergodic inhomogeneous Markov chains, and controlled Markov chains with
non-stationary Markov, episodic, and greedy controls. Lastly, we use these
sample complexity bounds to establish concomitant ones for offline evaluation
of stationary Markov control policies.
( 2
min )
Transfer learning plays a key role in modern data analysis when: (1) the
target data are scarce but the source data are sufficient; (2) the
distributions of the source and target data are heterogeneous. This paper
develops an interpretable unified transfer learning model, termed as UTrans,
which can detect both transferable variables and source data. More
specifically, we establish the estimation error bounds and prove that our
bounds are lower than those with target data only. Besides, we propose a source
detection algorithm based on hypothesis testing to exclude the nontransferable
data. We evaluate and compare UTrans to the existing algorithms in multiple
experiments. It is shown that UTrans attains much lower estimation and
prediction errors than the existing methods, while preserving interpretability.
We finally apply it to the US intergenerational mobility data and compare our
proposed algorithms to the classical machine learning algorithms.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
Experiments at the High-Luminosity LHC and the Future Circular Collider need
efficient algorithms to reconstruct granular events expected at such detectors
with high fidelity. We study scalable machine learning models for event
reconstruction in electron-positron collisions based on a full detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters. We compare a graph neural
network and kernel-based transformer and demonstrate that we can avoid
quadratic operations while achieving realistic reconstruction. We show that
hyperparameter tuning significantly improves the performance of the models. The
best graph neural network model shows improvement in the jet transverse
momentum resolution by up to 50% compared to the rule-based algorithm. Accurate
reconstruction can significantly improve future measurements at colliders. The
resulting model is portable across Nvidia, AMD and Habana hardware. Our
datasets and software are published following the findable, accessible,
interoperable, and reusable principles.
( 3
min )
This paper proposes a model learning Semi-parametric rela- tionships in an
Expert Bayesian Network (SEBN) with linear parameter and structure constraints.
We use Gaussian Pro- cesses and a Horseshoe prior to introduce minimal nonlin-
ear components. To prioritize modifying the expert graph over adding new edges,
we optimize differential Horseshoe scales. In real-world datasets with unknown
truth, we gen- erate diverse graphs to accommodate user input, addressing
identifiability issues and enhancing interpretability. Evalua- tion on
synthetic and UCI Liver Disorders datasets, using metrics like structural
Hamming Distance and test likelihood, demonstrates our models outperform
state-of-the-art semi- parametric Bayesian Network model.
( 2
min )
We provide a nonasymptotic analysis of the convergence of the stochastic
gradient Hamiltonian Monte Carlo (SGHMC) to a target measure in Wasserstein-2
distance without assuming log-concavity. Our analysis quantifies key
theoretical properties of the SGHMC as a sampler under local conditions which
significantly improves the findings of previous results. In particular, we
prove that the Wasserstein-2 distance between the target and the law of the
SGHMC is uniformly controlled by the step-size of the algorithm, therefore
demonstrate that the SGHMC can provide high-precision results uniformly in the
number of iterations. The analysis also allows us to obtain nonasymptotic
bounds for nonconvex optimization problems under local conditions and implies
that the SGHMC, when viewed as a nonconvex optimizer, converges to a global
minimum with the best known rates. We apply our results to obtain nonasymptotic
bounds for scalable Bayesian inference and nonasymptotic generalization bounds.
( 2
min )
Many machine learning applications require operating on a spatially
distributed dataset. Despite technological advances, privacy considerations and
communication constraints may prevent gathering the entire dataset in a central
unit. In this paper, we propose a distributed sampling scheme based on the
alternating direction method of multipliers, which is commonly used in the
optimization literature due to its fast convergence. In contrast to distributed
optimization, distributed sampling allows for uncertainty quantification in
Bayesian inference tasks. We provide both theoretical guarantees of our
algorithm's convergence and experimental evidence of its superiority to the
state-of-the-art. For our theoretical results, we use convex optimization tools
to establish a fundamental inequality on the generated local sample iterates.
This inequality enables us to show convergence of the distribution associated
with these iterates to the underlying target distribution in Wasserstein
distance. In simulation, we deploy our algorithm on linear and logistic
regression tasks and illustrate its fast convergence compared to existing
gradient-based methods.
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its behavior in three distinct settings -- offline, online, and hybrid -- and
propose efficient algorithms with finite-sample theoretical guarantees.
Moving towards practical applications, our framework, with a robust
approximation of the information-theoretical policy improvement oracle,
naturally gives rise to several novel RLHF algorithms. This includes an
iterative version of the Direct Preference Optimization (DPO) algorithm for
online settings, and a multi-step rejection sampling strategy for offline
scenarios. Our empirical evaluations on real-world alignment experiment of
large language model demonstrate that these proposed methods significantly
surpass existing strong baselines, such as DPO and Rejection Sampling
Optimization (RSO), showcasing the connections between solid theoretical
foundations and their powerful practical implementations.
( 2
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
In this work, we leverage the intrinsic segmentation of language sequences
and design a new positional encoding method called Bilevel Positional Encoding
(BiPE). For each position, our BiPE blends an intra-segment encoding and an
inter-segment encoding. The intra-segment encoding identifies the locations
within a segment and helps the model capture the semantic information therein
via absolute positional encoding. The inter-segment encoding specifies the
segment index, models the relationships between segments, and aims to improve
extrapolation capabilities via relative positional encoding. Theoretical
analysis shows this disentanglement of positional information makes learning
more effective. The empirical results also show that our BiPE has superior
length extrapolation capabilities across a wide range of tasks in diverse text
modalities.
( 2
min )
The notion of Boolean logic backpropagation was introduced to build neural
networks with weights and activations being Boolean numbers. Most of
computations can be done with Boolean logic instead of real arithmetic, both
during training and inference phases. But the underlying discrete optimization
problem is NP-hard, and the Boolean logic has no guarantee. In this work we
propose the first convergence analysis, under standard non-convex assumptions.
( 2
min )
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that
aligns language models closely with human-centric values. The initial phase of
RLHF involves learning human values using a reward model from ranking data. It
is observed that the performance of the reward model degrades after one epoch
of training, and optimizing too much against the learned reward model
eventually hinders the true objective. This paper delves into these issues,
leveraging the theoretical insights to design improved reward learning
algorithm termed 'Iterative Data Smoothing' (IDS). The core idea is that during
each training epoch, we not only update the model with the data, but also
update the date using the model, replacing hard labels with soft labels. Our
empirical findings highlight the superior performance of this approach over the
traditional methods.
( 2
min )
This paper develops a new dimension-free Azuma-Hoeffding type bound on
summation norm of a martingale difference sequence with random individual
bounds. With this novel result, we provide high-probability bounds for the
gradient norm estimator in the proposed algorithm Prob-SARAH, which is a
modified version of the StochAstic Recursive grAdient algoritHm (SARAH), a
state-of-art variance reduced algorithm that achieves optimal computational
complexity in expectation for the finite sum problem. The in-probability
complexity by Prob-SARAH matches the best in-expectation result up to
logarithmic factors. Empirical experiments demonstrate the superior
probabilistic performance of Prob-SARAH on real datasets compared to other
popular algorithms.
( 2
min )
Good arm identification (GAI) is a pure-exploration bandit problem in which a
single learner outputs an arm as soon as it is identified as a good arm. A good
arm is defined as an arm with an expected reward greater than or equal to a
given threshold. This paper focuses on the GAI problem under a small threshold
gap, which refers to the distance between the expected rewards of arms and the
given threshold. We propose a new algorithm called lil'HDoC to significantly
improve the total sample complexity of the HDoC algorithm. We demonstrate that
the sample complexity of the first $\lambda$ output arm in lil'HDoC is bounded
by the original HDoC algorithm, except for one negligible term, when the
distance between the expected reward and threshold is small. Extensive
experiments confirm that our algorithm outperforms the state-of-the-art
algorithms in both synthetic and real-world datasets.
( 2
min )
We present new concentration inequalities for either martingale dependent or
exchangeable random symmetric matrices under a variety of tail conditions,
encompassing standard Chernoff bounds to self-normalized heavy-tailed settings.
These inequalities are often randomized in a way that renders them strictly
tighter than existing deterministic results in the literature, are typically
expressed in the Loewner order, and are sometimes valid at arbitrary
data-dependent stopping times.
Along the way, we explore the theory of matrix supermartingales and maximal
inequalities, potentially of independent interest.
( 2
min )
Matrix completion is one of the crucial tools in modern data science
research. Recently, a novel sampling model for matrix completion coined
cross-concentrated sampling (CCS) has caught much attention. However, the
robustness of the CCS model against sparse outliers remains unclear in the
existing studies. In this paper, we aim to answer this question by exploring a
novel Robust CCS Completion problem. A highly efficient non-convex iterative
algorithm, dubbed Robust CUR Completion (RCURC), is proposed. The empirical
performance of the proposed algorithm, in terms of both efficiency and
robustness, is verified in synthetic and real datasets.
( 2
min )
We explore a novel methodology for constructing confidence regions for
parameters of linear models, using predictions from any arbitrary predictor.
Our framework requires minimal assumptions on the noise and can be extended to
functions deviating from strict linearity up to some adjustable threshold,
thereby accommodating a comprehensive and pragmatically relevant set of
functions. The derived confidence regions can be cast as constraints within a
Mixed Integer Linear Programming framework, enabling optimisation of linear
objectives. This representation enables robust optimization and the extraction
of confidence intervals for specific parameter coordinates. Unlike previous
methods, the confidence region can be empty, which can be used for hypothesis
testing. Finally, we validate the empirical applicability of our method on
synthetic data.
( 2
min )
The paper studies the problem of constructing nonparametric simultaneous
confidence bands with nonasymptotic and distribition-free guarantees. The
target function is assumed to be band-limited and the approach is based on the
theory of Paley-Wiener reproducing kernel Hilbert spaces. The starting point of
the paper is a recently developed algorithm to which we propose three types of
improvements. First, we relax the assumptions on the noises by replacing the
symmetricity assumption with a weaker distributional invariance principle.
Then, we propose a more efficient way to estimate the norm of the target
function, and finally we enhance the construction of the confidence bands by
tightening the constraints of the underlying convex optimization problems. The
refinements are also illustrated through numerical experiments.
( 2
min )
Data analysis often requires methods that are invariant with respect to
specific transformations, such as rotations in case of images or shifts in case
of images and time series. While principal component analysis (PCA) is a
widely-used dimension reduction technique, it lacks robustness with respect to
these transformations. Modern alternatives, such as autoencoders, can be
invariant with respect to specific transformations but are generally not
interpretable. We introduce General Transform-Invariant Principal Component
Analysis (GT-PCA) as an effective and interpretable alternative to PCA and
autoencoders. We propose a neural network that efficiently estimates the
components and show that GT-PCA significantly outperforms alternative methods
in experiments based on synthetic and real data.
( 2
min )
Logistic regression is a ubiquitous method for probabilistic classification.
However, the effectiveness of logistic regression depends upon careful and
relatively computationally expensive tuning, especially for the regularisation
hyperparameter, and especially in the context of high-dimensional data. We
present a prevalidated ridge regression model that closely matches logistic
regression in terms of classification error and log-loss, particularly for
high-dimensional data, while being significantly more computationally efficient
and having effectively no hyperparameters beyond regularisation. We scale the
coefficients of the model so as to minimise log-loss for a set of prevalidated
predictions derived from the estimated leave-one-out cross-validation error.
This exploits quantities already computed in the course of fitting the ridge
regression model in order to find the scaling parameter with nominal additional
computational expense.
( 2
min )
Pre-training is known to generate universal representations for downstream
tasks in large-scale deep learning such as large language models. Existing
literature, e.g., \cite{kim2020adversarial}, empirically observe that the
downstream tasks can inherit the adversarial robustness of the pre-trained
model. We provide theoretical justifications for this robustness inheritance
phenomenon. Our theoretical results reveal that feature purification plays an
important role in connecting the adversarial robustness of the pre-trained
model and the downstream tasks in two-layer neural networks. Specifically, we
show that (i) with adversarial training, each hidden node tends to pick only
one (or a few) feature; (ii) without adversarial training, the hidden nodes can
be vulnerable to attacks. This observation is valid for both supervised
pre-training and contrastive learning. With purified nodes, it turns out that
clean training is enough to achieve adversarial robustness in downstream tasks.
( 2
min )
When generative AI is given a prompt to display an image in a certain way or style, what it also means is telling AI to imagine. The request to imagine is an acknowledgment that it has a will to do so, not just the capability [or the possession of contents] to do so. This will… Read More »GenAI regulation: Are deepfakes indicative of free will in LLMs?
The post GenAI regulation: Are deepfakes indicative of free will in LLMs? appeared first on Data Science Central.
( 22
min )
A podcast with CEO Ricky Sun of Ultipa Image by Gerd Altmann from Pixabay Relationship-rich graph structures can be quite complex and resource consuming to process at scale when using conventional technology. This is particularly the case when it comes to searches that demand the computation to reach 30 hops or more into the graphs. … Read More »High-performance computing’s role in real-time graph analytics
The post High-performance computing’s role in real-time graph analytics appeared first on Data Science Central.
( 20
min )
With the advent of generative AI, today’s foundation models (FMs), such as the large language models (LLMs) Claude 2 and Llama 2, can perform a range of generative tasks such as question answering, summarization, and content creation on text data. However, real-world data exists in multiple modalities, such as text, images, video, and audio. Take […]
( 12
min )
Microsoft announces the AFMR Minority Serving Institutions grant recipients, advancing AI research focused on today’s most significant technical and societal challenges. The grant provides funding and access to Azure-hosted foundation models.
The post Announcing recipients of the AFMR Minority Serving Institutions grant appeared first on Microsoft Research.
( 8
min )
This week’s featured In the NVIDIA Studio 3D artist Brandon Tieh puts his artistic talents on full display with his whimsical scene “Magic Valley.”
( 7
min )
Counterfactual explanations, and their associated algorithmic recourse, are
typically leveraged to understand, explain, and potentially alter a prediction
coming from a black-box classifier. In this paper, we propose to extend the use
of counterfactuals to evaluate progress in sequential decision making tasks. To
this end, we introduce a model-agnostic modular framework, TraCE (Trajectory
Counterfactual Explanation) scores, which is able to distill and condense
progress in highly complex scenarios into a single value. We demonstrate
TraCE's utility across domains by showcasing its main properties in two case
studies spanning healthcare and climate change.
( 2
min )
Markov processes are widely used mathematical models for describing dynamic
systems in various fields. However, accurately simulating large-scale systems
at long time scales is computationally expensive due to the short time steps
required for accurate integration. In this paper, we introduce an inference
process that maps complex systems into a simplified representational space and
models large jumps in time. To achieve this, we propose Time-lagged Information
Bottleneck (T-IB), a principled objective rooted in information theory, which
aims to capture relevant temporal features while discarding high-frequency
information to simplify the simulation task and minimize the inference error.
Our experiments demonstrate that T-IB learns information-optimal
representations for accurately modeling the statistical properties and dynamics
of the original process at a selected time lag, outperforming existing
time-lagged dimensionality reduction methods.
( 2
min )
We consider the task of estimating functions belonging to a specific class of
nonsmooth functions, namely so-called tame functions. These functions appear in
a wide range of applications: training deep learning, value functions of
mixed-integer programs, or wave functions of small molecules. We show that tame
functions are approximable by piecewise polynomials on any full-dimensional
cube. We then present the first ever mixed-integer programming formulation of
piecewise polynomial regression. Together, these can be used to estimate tame
functions. We demonstrate promising computational results.
( 2
min )
With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are
increasing interests in distilling the capabilies of close-sourced LLMs to
smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT
to generate a set of instructions and answers, for the student model to learn.
However, such standard distillation approach neglects the merits and conditions
of the student model. Inspired by modern teaching principles, we design a
personalised distillation process, in which the student attempts to solve a
task first, then the teacher provides an adaptive refinement for the student to
improve. Instead of feeding the student with teacher's prior, personalised
distillation enables personalised learning for the student model, as it only
learns on examples it makes mistakes upon and learns to improve its own
solution. On code generation, personalised distillation consistently
outperforms standard distillation with only one third of the data. With only
2.5-3K personalised examples that incur a data-collection cost of 4-6$, we
boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to
achieve 45.8% pass@1 on HumanEval.
( 2
min )
The performance of data fusion and tracking algorithms often depends on
parameters that not only describe the sensor system, but can also be
task-specific. While for the sensor system tuning these variables is
time-consuming and mostly requires expert knowledge, intrinsic parameters of
targets under track can even be completely unobservable until the system is
deployed. With state-of-the-art sensor systems growing more and more complex,
the number of parameters naturally increases, necessitating the automatic
optimization of the model variables. In this paper, the parameters of an
interacting multiple model (IMM) filter are optimized solely using
measurements, thus without necessity for any ground-truth data. The resulting
method is evaluated through an ablation study on simulated data, where the
trained model manages to match the performance of a filter parametrized with
ground-truth values.
( 2
min )
Training offline reinforcement learning (RL) models using visual inputs poses
two significant challenges, i.e., the overfitting problem in representation
learning and the overestimation bias for expected future rewards. Recent work
has attempted to alleviate the overestimation bias by encouraging conservative
behaviors. This paper, in contrast, tries to build more flexible constraints
for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily
interacted with in an online manner, as the "test bed" for offline policies. To
enable effective online-to-offline knowledge transfer, we introduce CoWorld, a
model-based RL approach that mitigates cross-domain discrepancies in state and
reward spaces. Experimental results demonstrate the effectiveness of CoWorld,
outperforming existing RL approaches by large margins.
( 2
min )
The age and stroke-associated decline in musculoskeletal strength degrades
the ability to perform daily human tasks using the upper extremities. Although
there are a few examples of exoskeletons, they need manual operations due to
the absence of sensor feedback and no intention prediction of movements. Here,
we introduce an intelligent upper-limb exoskeleton system that uses cloud-based
deep learning to predict human intention for strength augmentation. The
embedded soft wearable sensors provide sensory feedback by collecting real-time
muscle signals, which are simultaneously computed to determine the user's
intended movement. The cloud-based deep-learning predicts four upper-limb joint
motions with an average accuracy of 96.2% at a 200-250 millisecond response
rate, suggesting that the exoskeleton operates just by human intention. In
addition, an array of soft pneumatics assists the intended movements by
providing 897 newton of force and 78.7 millimeter of displacement at maximum.
Collectively, the intent-driven exoskeleton can augment human strength by 5.15
times on average compared to the unassisted exoskeleton. This report
demonstrates an exoskeleton robot that augments the upper-limb joint movements
by human intention based on a machine-learning cloud computing and sensory
feedback.
( 3
min )
Contrastive self-supervised learning has gained attention for its ability to
create high-quality representations from large unlabelled data sets. A key
reason that these powerful features enable data-efficient learning of
downstream tasks is that they provide augmentation invariance, which is often a
useful inductive bias. However, the amount and type of invariances preferred is
not known apriori, and varies across different downstream tasks. We therefore
propose a multi-task self-supervised framework (MT-SLVR) that learns both
variant and invariant features in a parameter-efficient manner. Our multi-task
representation provides a strong and flexible feature that benefits diverse
downstream tasks. We evaluate our approach on few-shot classification tasks
drawn from a variety of audio domains and demonstrate improved classification
performance on all of them
( 2
min )
Low-rank matrix completion consists of computing a matrix of minimal
complexity that recovers a given set of observations as accurately as possible.
Unfortunately, existing methods for matrix completion are heuristics that,
while highly scalable and often identifying high-quality solutions, do not
possess any optimality guarantees. We reexamine matrix completion with an
optimality-oriented eye. We reformulate these low-rank problems as convex
problems over the non-convex set of projection matrices and implement a
disjunctive branch-and-bound scheme that solves them to certifiable optimality.
Further, we derive a novel and often tight class of convex relaxations by
decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing
that two-by-two minors in each rank-one matrix have determinant zero. In
numerical experiments, our new convex relaxations decrease the optimality gap
by two orders of magnitude compared to existing attempts, and our disjunctive
branch-and-bound scheme solves nxn rank-r matrix completion problems to
certifiable optimality in hours for n<=150 and r<=5.
( 2
min )
In this study, we harness the information-theoretic Privacy Funnel (PF) model
to develop a method for privacy-preserving representation learning using an
end-to-end training framework. We rigorously address the trade-off between
obfuscation and utility. Both are quantified through the logarithmic loss, a
measure also recognized as self-information loss. This exploration deepens the
interplay between information-theoretic privacy and representation learning,
offering substantive insights into data protection mechanisms for both
discriminative and generative models. Importantly, we apply our model to
state-of-the-art face recognition systems. The model demonstrates adaptability
across diverse inputs, from raw facial images to both derived or refined
embeddings, and is competent in tasks such as classification, reconstruction,
and generation.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
We propose an approach for continuous prediction of turn-taking and
backchanneling locations in spoken dialogue by fusing a neural acoustic model
with a large language model (LLM). Experiments on the Switchboard human-human
conversation dataset demonstrate that our approach consistently outperforms the
baseline models with single modality. We also develop a novel multi-task
instruction fine-tuning strategy to further benefit from LLM-encoded knowledge
for understanding the tasks and conversational contexts, leading to additional
improvements. Our approach demonstrates the potential of combined LLMs and
acoustic models for a more natural and conversational interaction between
humans and speech-enabled AI agents.
( 2
min )
We study the geometry of linear networks with one-dimensional convolutional
layers. The function spaces of these networks can be identified with
semi-algebraic families of polynomials admitting sparse factorizations. We
analyze the impact of the network's architecture on the function space's
dimension, boundary, and singular points. We also describe the critical points
of the network's parameterization map. Furthermore, we study the optimization
problem of training a network with the squared error loss. We prove that for
architectures where all strides are larger than one and generic data, the
non-zero critical points of that optimization problem are smooth interior
points of the function space. This property is known to be false for dense
linear networks and linear convolutional networks with stride one.
( 2
min )
Performing classification on noisy, crowdsourced image datasets can prove
challenging even for the best neural networks. Two issues which complicate the
problem on such datasets are class imbalance and ground-truth uncertainty in
labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped,
individual characters from images of ancient Greek papyri - are strongly
affected by both issues. The application of ensemble modeling to such datasets
can help identify images where the ground-truth is questionable and quantify
the trustworthiness of those samples. As such, we apply stacked generalization
consisting of nearly identical ResNets with different loss functions: one
utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence
(KLD). Both networks use labels drawn from a crowd-sourced consensus. This
consensus is derived from a Normalized Distribution of Annotations (NDA) based
on all annotations for a given character in the dataset. For the second
network, the KLD is calculated with respect to the NDA. For our ensemble model,
we apply a k-nearest neighbors model to the outputs of the CXE and KLD
networks. Individually, the ResNet models have approximately 93% accuracy,
while the ensemble model achieves an accuracy of > 95%, increasing the
classification trustworthiness. We also perform an analysis of the Shannon
entropy of the various models' output distributions to measure classification
uncertainty. Our results suggest that entropy is useful for predicting model
misclassifications.
( 3
min )
In this paper, we study the expressivity of scalar, Markovian reward
functions in Reinforcement Learning (RL), and identify several limitations to
what they can express. Specifically, we look at three classes of RL tasks;
multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive
necessary and sufficient conditions that describe when a problem in this class
can be expressed using a scalar, Markovian reward. Moreover, we find that
scalar, Markovian rewards are unable to express most of the instances in each
of these three classes. We thereby contribute to a more complete understanding
of what standard reward functions can and cannot express. In addition to this,
we also call attention to modal problems as a new class of problems, since they
have so far not been given any systematic treatment in the RL literature. We
also briefly outline some approaches for solving some of the problems we
discuss, by means of bespoke RL algorithms.
( 2
min )
This paper introduces a novel approach to enumerate and assess Trapping sets
in quasi-cyclic codes, those with circulant sizes that are non-prime numbers.
Leveraging the quasi-cyclic properties, the method employs a tabular technique
to streamline the importance sampling step for estimating the pseudo-codeword
weight of Trapping sets. The presented methodology draws on the mathematical
framework established in the provided theorem, which elucidates the behavior of
projection and lifting transformations on pseudo-codewords
( 2
min )
The validation of global climate models is crucial to ensure the accuracy and
efficacy of model output. We introduce the spherical convolutional Wasserstein
distance to more comprehensively measure differences between climate models and
reanalysis data. This new similarity measure accounts for spatial variability
using convolutional projections and quantifies local differences in the
distribution of climate variables. We apply this method to evaluate the
historical model outputs of the Coupled Model Intercomparison Project (CMIP)
members by comparing them to observational and reanalysis data products.
Additionally, we investigate the progression from CMIP phase 5 to phase 6 and
find modest improvements in the phase 6 models regarding their ability to
produce realistic climatologies.
( 2
min )
Much of the research in differential privacy has focused on offline
applications with the assumption that all data is available at once. When these
algorithms are applied in practice to streams where data is collected over
time, this either violates the privacy guarantees or results in poor utility.
We derive an algorithm for differentially private synthetic streaming data
generation, especially curated towards spatial datasets. Furthermore, we
provide a general framework for online selective counting among a collection of
queries which forms a basis for many tasks such as query answering and
synthetic data generation. The utility of our algorithm is verified on both
real-world and simulated datasets.
( 2
min )
One problem with researching cognitive modeling and reinforcement learning
(RL) is that researchers spend too much time on setting up an appropriate
computational framework for their experiments. Many open source implementations
of current RL algorithms exist, but there is a lack of a modular suite of tools
combining different robotic simulators and platforms, data visualization,
hyperparameter optimization, and baseline experiments. To address this problem,
we present Scilab-RL, a software framework for efficient research in cognitive
modeling and reinforcement learning for robotic agents. The framework focuses
on goal-conditioned reinforcement learning using Stable Baselines 3 and the
OpenAI gym interface. It enables native possibilities for experiment
visualizations and hyperparameter optimization. We describe how these features
enable researchers to conduct experiments with minimal time effort, thus
maximizing research output.
( 2
min )
Partial differential equations (PDEs) are commonly employed to model complex
industrial systems characterized by multivariable dependence. Existing
physics-informed neural networks (PINNs) excel in solving PDEs in a homogeneous
medium. However, their feasibility is diminished when PDE parameters are
unknown due to a lack of physical attributions and time-varying interface is
unavailable arising from heterogeneous media. To this end, we propose a
data-physics-hybrid method, physically informed synchronic-adaptive learning
(PISAL), to solve PDEs for industrial systems modeling in heterogeneous media.
First, Net1, Net2, and NetI, are constructed to approximate the solutions
satisfying PDEs and the interface. Net1 and Net2 are utilized to synchronously
learn each solution satisfying PDEs with diverse parameters, while NetI is
employed to adaptively learn the unavailable time-varying interface. Then, a
criterion combined with NetI is introduced to adaptively distinguish the
attributions of measurements and collocation points. Furthermore, NetI is
integrated into a data-physics-hybrid loss function. Accordingly, a
synchronic-adaptive learning (SAL) strategy is proposed to decompose and
optimize each subdomain. Besides, we theoretically prove the approximation
capability of PISAL. Extensive experimental results verify that the proposed
PISAL can be used for industrial systems modeling in heterogeneous media, which
faces the challenges of lack of physical attributions and unavailable
time-varying interface.
( 2
min )
Prompt design and engineering has become an important discipline in just the
past few months. In this paper, we provide an introduction to the main concepts
as well as review basic and more advanced approaches to prompt design and
engineering.
( 2
min )
We investigate the problem of learning Linear Quadratic Regulators (LQR) in a
multi-task, heterogeneous, and model-free setting. We characterize the
stability and personalization guarantees of a Policy Gradient-based (PG)
Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017) approach for the LQR
problem under different task-heterogeneity settings. We show that the MAML-LQR
approach produces a stabilizing controller close to each task-specific optimal
controller up to a task-heterogeneity bias for both model-based and model-free
settings. Moreover, in the model-based setting, we show that this controller is
achieved with a linear convergence rate, which improves upon sub-linear rates
presented in existing MAML-LQR work. In contrast to existing MAML-LQR results,
our theoretical guarantees demonstrate that the learned controller can
efficiently adapt to unseen LQR tasks.
( 2
min )
Machine learning is about forecasting. Forecasts, however, obtain their
usefulness only through their evaluation. Machine learning has traditionally
focused on types of losses and their corresponding regret. Currently, the
machine learning community regained interest in calibration. In this work, we
show the conceptual equivalence of calibration and regret in evaluating
forecasts. We frame the evaluation problem as a game between a forecaster, a
gambler and nature. Putting intuitive restrictions on gambler and forecaster,
calibration and regret naturally fall out of the framework. In addition, this
game links evaluation of forecasts to randomness of outcomes. Random outcomes
with respect to forecasts are equivalent to good forecasts with respect to
outcomes. We call those dual aspects, calibration and regret, predictiveness
and randomness, the four facets of forecast felicity.
( 2
min )
We present a manifold-based autoencoder method for learning nonlinear
dynamics in time, notably partial differential equations (PDEs), in which the
manifold latent space evolves according to Ricci flow. This can be accomplished
by simulating Ricci flow in a physics-informed setting, and manifold quantities
can be matched so that Ricci flow is empirically achieved. With our
methodology, the manifold is learned as part of the training procedure, so
ideal geometries may be discerned, while the evolution simultaneously induces a
more accommodating latent representation over static methods. We present our
method on a range of numerical experiments consisting of PDEs that encompass
desirable characteristics such as periodicity and randomness, remarking error
on in-distribution and extrapolation scenarios.
( 2
min )
Kalman filters provide a straightforward and interpretable means to estimate
hidden or latent variables, and have found numerous applications in control,
robotics, signal processing, and machine learning. One such application is
neural decoding for neuroprostheses. In 2020, Burkhart et al. thoroughly
evaluated their new version of the Kalman filter that leverages Bayes' theorem
to improve filter performance for highly non-linear or non-Gaussian observation
models. This work provides an open-source Python alternative to the authors'
MATLAB algorithm. Specifically, we reproduce their most salient results for
neuroscientific contexts and further examine the efficacy of their filter using
multiple random seeds and previously unused trials from the authors' dataset.
All experiments were performed offline on a single computer.
( 2
min )
This paper serves as a comprehensive system description of version 2.0 of the
Marabou framework for formal analysis of neural networks. We discuss the tool's
architectural design and highlight the major features and components introduced
since its initial release.
( 2
min )
Low-rank matrix completion consists of computing a matrix of minimal
complexity that recovers a given set of observations as accurately as possible.
Unfortunately, existing methods for matrix completion are heuristics that,
while highly scalable and often identifying high-quality solutions, do not
possess any optimality guarantees. We reexamine matrix completion with an
optimality-oriented eye. We reformulate these low-rank problems as convex
problems over the non-convex set of projection matrices and implement a
disjunctive branch-and-bound scheme that solves them to certifiable optimality.
Further, we derive a novel and often tight class of convex relaxations by
decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing
that two-by-two minors in each rank-one matrix have determinant zero. In
numerical experiments, our new convex relaxations decrease the optimality gap
by two orders of magnitude compared to existing attempts, and our disjunctive
branch-and-bound scheme solves nxn rank-r matrix completion problems to
certifiable optimality in hours for n<=150 and r<=5.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
Gradient Langevin dynamics and a variety of its variants have attracted
increasing attention owing to their convergence towards the global optimal
solution, initially in the unconstrained convex framework while recently even
in convex constrained non-convex problems. In the present work, we extend those
frameworks to non-convex problems on a non-convex feasible region with a global
optimization algorithm built upon reflected gradient Langevin dynamics and
derive its convergence rates. By effectively making use of its reflection at
the boundary in combination with the probabilistic representation for the
Poisson equation with the Neumann boundary condition, we present promising
convergence rates, particularly faster than the existing one for convex
constrained non-convex problems.
( 2
min )
We consider the community detection problem in a sparse $q$-uniform
hypergraph $G$, assuming that $G$ is generated according to the Hypergraph
Stochastic Block Model (HSBM). We prove that a spectral method based on the
non-backtracking operator for hypergraphs works with high probability down to
the generalized Kesten-Stigum detection threshold conjectured by Angelini et
al. (2015). We characterize the spectrum of the non-backtracking operator for
the sparse HSBM and provide an efficient dimension reduction procedure using
the Ihara-Bass formula for hypergraphs. As a result, community detection for
the sparse HSBM on $n$ vertices can be reduced to an eigenvector problem of a
$2n\times 2n$ non-normal matrix constructed from the adjacency matrix and the
degree matrix of the hypergraph. To the best of our knowledge, this is the
first provable and efficient spectral algorithm that achieves the conjectured
threshold for HSBMs with $r$ blocks generated according to a general symmetric
probability tensor.
( 2
min )
The validation of global climate models is crucial to ensure the accuracy and
efficacy of model output. We introduce the spherical convolutional Wasserstein
distance to more comprehensively measure differences between climate models and
reanalysis data. This new similarity measure accounts for spatial variability
using convolutional projections and quantifies local differences in the
distribution of climate variables. We apply this method to evaluate the
historical model outputs of the Coupled Model Intercomparison Project (CMIP)
members by comparing them to observational and reanalysis data products.
Additionally, we investigate the progression from CMIP phase 5 to phase 6 and
find modest improvements in the phase 6 models regarding their ability to
produce realistic climatologies.
( 2
min )
Numerous robust estimators exist as alternatives to the maximum likelihood
estimator (MLE) when a completely observed ground-up loss severity sample
dataset is available. However, the options for robust alternatives to MLE
become significantly limited when dealing with grouped loss severity data, with
only a handful of methods like least squares, minimum Hellinger distance, and
optimal bounded influence function available. This paper introduces a novel
robust estimation technique, the Method of Truncated Moments (MTuM),
specifically designed to estimate the tail index of a Pareto distribution from
grouped data. Inferential justification of MTuM is established by employing the
central limit theorem and validating them through a comprehensive simulation
study.
( 2
min )
Machine learning is about forecasting. Forecasts, however, obtain their
usefulness only through their evaluation. Machine learning has traditionally
focused on types of losses and their corresponding regret. Currently, the
machine learning community regained interest in calibration. In this work, we
show the conceptual equivalence of calibration and regret in evaluating
forecasts. We frame the evaluation problem as a game between a forecaster, a
gambler and nature. Putting intuitive restrictions on gambler and forecaster,
calibration and regret naturally fall out of the framework. In addition, this
game links evaluation of forecasts to randomness of outcomes. Random outcomes
with respect to forecasts are equivalent to good forecasts with respect to
outcomes. We call those dual aspects, calibration and regret, predictiveness
and randomness, the four facets of forecast felicity.
( 2
min )
We present a manifold-based autoencoder method for learning nonlinear
dynamics in time, notably partial differential equations (PDEs), in which the
manifold latent space evolves according to Ricci flow. This can be accomplished
by simulating Ricci flow in a physics-informed setting, and manifold quantities
can be matched so that Ricci flow is empirically achieved. With our
methodology, the manifold is learned as part of the training procedure, so
ideal geometries may be discerned, while the evolution simultaneously induces a
more accommodating latent representation over static methods. We present our
method on a range of numerical experiments consisting of PDEs that encompass
desirable characteristics such as periodicity and randomness, remarking error
on in-distribution and extrapolation scenarios.
( 2
min )
Kalman filters provide a straightforward and interpretable means to estimate
hidden or latent variables, and have found numerous applications in control,
robotics, signal processing, and machine learning. One such application is
neural decoding for neuroprostheses. In 2020, Burkhart et al. thoroughly
evaluated their new version of the Kalman filter that leverages Bayes' theorem
to improve filter performance for highly non-linear or non-Gaussian observation
models. This work provides an open-source Python alternative to the authors'
MATLAB algorithm. Specifically, we reproduce their most salient results for
neuroscientific contexts and further examine the efficacy of their filter using
multiple random seeds and previously unused trials from the authors' dataset.
All experiments were performed offline on a single computer.
( 2
min )
When deploying a large language model (LLM), machine learning (ML) practitioners typically care about two measurements for model serving performance: latency, defined by the time it takes to generate a single token, and throughput, defined by the number of tokens generated per second. Although a single request to the deployed endpoint would exhibit a throughput […]
( 22
min )
Hip disorders, comprising some of the world’s most common joint diseases, are especially prevalent among adolescents and young adults, causing stiffness, pain or a limp. But they can be hard to diagnose using solely 2D medical imaging. Helping to treat these disorders, the Boston Children’s Hospital’s (BCH’s) Adolescent and Young Adult Hip Preservation Program is Read article >
( 6
min )
A model on its own is typically not enough. It requires the data, which comes in a very specific format and has to be the same format that will be used at the time of inference or prediction.
The post From MLOps to LLMOps— and hardware headaches ahead appeared first on Data Science Central.
( 22
min )
This article explores the versatile applications of healthcare chatbots, shedding light on their transformative impact on patient care and medical processes.
The post Revolutionizing healthcare with chatbots: A humanized exploration appeared first on Data Science Central.
( 20
min )
Swarms of autonomous interactive drones, with the support of recharging
technology, can provide compelling sensing capabilities in Smart Cities, such
as traffic monitoring and disaster response. Existing approaches, including
distributed optimization and deep reinforcement learning (DRL), aim to
coordinate drones to achieve cost-effective, high-quality navigation, sensing,
and charging. However, they face grand challenges: short-term optimization is
not effective in dynamic environments with unanticipated changes, while
long-term learning lacks scalability, resilience, and flexibility. To bridge
this gap, this paper introduces a new progressive approach that combines
short-term plan generation and selection based on distributed optimization with
a DRL-based long-term strategic scheduling of flying direction. Extensive
experimentation with datasets generated from realistic urban mobility
underscores an outstanding performance of the proposed solution compared to
state-of-the-art. We also provide compelling new insights about the role of
drones density in different sensing missions, the energy safety of drone
operations and how to prioritize investments for key locations of charging
infrastructure.
( 2
min )
The recent introduction of the Least-Squares Support Vector Regression
(LS-SVR) algorithm for solving differential and integral equations has sparked
interest. In this study, we expand the application of this algorithm to address
systems of differential-algebraic equations (DAEs). Our work presents a novel
approach to solving general DAEs in an operator format by establishing
connections between the LS-SVR machine learning model, weighted residual
methods, and Legendre orthogonal polynomials. To assess the effectiveness of
our proposed method, we conduct simulations involving various DAE scenarios,
such as nonlinear systems, fractional-order derivatives, integro-differential,
and partial DAEs. Finally, we carry out comparisons between our proposed method
and currently established state-of-the-art approaches, demonstrating its
reliability and effectiveness.
( 2
min )
The generation of undesirable and factually incorrect content of large
language models poses a significant challenge and remains largely an unsolved
issue. This paper studies the integration of a contrastive learning objective
for fine-tuning LLMs for implicit knowledge editing and controlled text
generation. Optimizing the training objective entails aligning text
perplexities in a contrastive fashion. To facilitate training the model in a
self-supervised fashion, we leverage an off-the-shelf LLM for training data
generation. We showcase applicability in the domain of detoxification. Herein,
the proposed approach leads to a significant decrease in the generation of
toxic content while preserving general utility for downstream tasks such as
commonsense reasoning and reading comprehension. The proposed approach is
conceptually simple but empirically powerful.
( 2
min )
Electronic Health Record (EHR) data, while rich in information, often suffers
from sparsity, posing significant challenges in predictive modeling.
Traditional imputation methods inadequately distinguish between real and
imputed data, leading to potential inaccuracies in models. Addressing this, we
introduce PRISM, a novel approach that indirectly imputes data through
prototype representations of similar patients, thus ensuring denser and more
accurate embeddings. PRISM innovates further with a feature confidence learner
module, which evaluates the reliability of each feature in light of missing
data. Additionally, it incorporates a novel patient similarity metric that
accounts for feature confidence, avoiding overreliance on imprecise imputed
values. Our extensive experiments on the MIMIC-III and MIMIC-IV datasets
demonstrate PRISM's superior performance in predicting in-hospital mortality
and 30-day readmission tasks, showcasing its effectiveness in handling EHR data
sparsity. For the sake of reproducibility and further research, we have made
the code publicly available at https://github.com/yhzhu99/PRISM.
( 2
min )
"You never forget how to ride a bike", -- but how is that possible? The brain
is able to learn complex skills, stop the practice for years, learn other
skills in between, and still retrieve the original knowledge when necessary.
The mechanisms of this capability, referred to as lifelong learning (or
continual learning, CL), are unknown. We suggest a bio-plausible
meta-plasticity rule building on classical work in CL which we summarize in two
principles: (i) neurons are context selective, and (ii) a local availability
variable partially freezes the plasticity if the neuron was relevant for
previous tasks. In a new neuro-centric formalization of these principles, we
suggest that neuron selectivity and neuron-wide consolidation is a simple and
viable meta-plasticity hypothesis to enable CL in the brain. In simulation,
this simple model balances forgetting and consolidation leading to better
transfer learning than contemporary CL algorithms on image recognition and
natural language processing CL benchmarks.
( 2
min )
Edge Intelligence (EI) integrates Edge Computing (EC) and Artificial
Intelligence (AI) to push the capabilities of AI to the network edge for
real-time, efficient and secure intelligent decision-making and computation.
However, EI faces various challenges due to resource constraints, heterogeneous
network environments, and diverse service requirements of different
applications, which together affect the trustworthiness of EI in the eyes of
stakeholders. This survey comprehensively summarizes the characteristics,
architecture, technologies, and solutions of trustworthy EI. Specifically, we
first emphasize the need for trustworthy EI in the context of the trend toward
large models. We then provide an initial definition of trustworthy EI, explore
its key characteristics and give a multi-layered architecture for trustworthy
EI. Then, we summarize several important issues that hinder the achievement of
trustworthy EI. Subsequently, we present enabling technologies for trustworthy
EI systems and provide an in-depth literature review of the state-of-the-art
solutions for realizing the trustworthiness of EI. Finally, we discuss the
corresponding research challenges and open issues.
( 2
min )
Depression is a global burden and one of the most challenging mental health
conditions to control. Experts can detect its severity early using the Beck
Depression Inventory (BDI) questionnaire, administer appropriate medication to
patients, and impede its progression. Due to the fear of potential
stigmatization, many patients turn to social media platforms like Reddit for
advice and assistance at various stages of their journey. This research
extracts text from Reddit to facilitate the diagnostic process. It employs a
proposed labeling approach to categorize the text and subsequently fine-tunes
the Longformer model. The model's performance is compared against baseline
models, including Naive Bayes, Random Forest, Support Vector Machines, and
Gradient Boosting. Our findings reveal that the Longformer model outperforms
the baseline models in both English (48%) and Luganda (45%) languages on a
custom-made dataset.
( 2
min )
Label-free cell classification is advantageous for supplying pristine cells
for further use or examination, yet existing techniques frequently fall short
in terms of specificity and speed. In this study, we address these limitations
through the development of a novel machine learning framework, Multiplex Image
Machine Learning (MIML). This architecture uniquely combines label-free cell
images with biomechanical property data, harnessing the vast, often
underutilized morphological information intrinsic to each cell. By integrating
both types of data, our model offers a more holistic understanding of the
cellular properties, utilizing morphological information typically discarded in
traditional machine learning models. This approach has led to a remarkable
98.3\% accuracy in cell classification, a substantial improvement over models
that only consider a single data type. MIML has been proven effective in
classifying white blood cells and tumor cells, with potential for broader
application due to its inherent flexibility and transfer learning capability.
It's particularly effective for cells with similar morphology but distinct
biomechanical properties. This innovative approach has significant implications
across various fields, from advancing disease diagnostics to understanding
cellular behavior.
( 3
min )
Due to the continuous change in operational data, AIOps solutions suffer from
performance degradation over time. Although periodic retraining is the
state-of-the-art technique to preserve the failure prediction AIOps models'
performance over time, this technique requires a considerable amount of labeled
data to retrain. In AIOps obtaining label data is expensive since it requires
the availability of domain experts to intensively annotate it. In this paper,
we present McUDI, a model-centric unsupervised degradation indicator that is
capable of detecting the exact moment the AIOps model requires retraining as a
result of changes in data. We further show how employing McUDI in the
maintenance pipeline of AIOps solutions can reduce the number of samples that
require annotations with 30k for job failure prediction and 260k for disk
failure prediction while achieving similar performance with periodic
retraining.
( 2
min )
Speech foundation models (SFMs) have been benchmarked on many speech
processing tasks, often achieving state-of-the-art performance with minimal
adaptation. However, the SFM paradigm has been significantly less explored for
applications of interest to the speech perception community. In this paper we
present a systematic evaluation of 10 SFMs on one such application: Speech
intelligibility prediction. We focus on the non-intrusive setup of the Clarity
Prediction Challenge 2 (CPC2), where the task is to predict the percentage of
words correctly perceived by hearing-impaired listeners from speech-in-noise
recordings. We propose a simple method that learns a lightweight specialized
prediction head on top of frozen SFMs to approach the problem. Our results
reveal statistically significant differences in performance across SFMs. Our
method resulted in the winning submission in the CPC2, demonstrating its
promise for speech perception applications.
( 2
min )
There is an evident lack of implementation of Machine Learning (ML) in the
legal domain in India, and any research that does take place in this domain is
usually based on data from the higher courts of law and works with English
data. The lower courts and data from the different regional languages of India
are often overlooked. In this paper, we deploy a Convolutional Neural Network
(CNN) architecture on a corpus of Hindi legal documents. We perform a bail
Prediction task with the help of a CNN model and achieve an overall accuracy of
93\% which is an improvement on the benchmark accuracy, set by Kapoor et al.
(2022), albeit in data from 20 districts of the Indian state of Uttar Pradesh.
( 2
min )
We propose a novel algorithm for the support estimation of partially known
Gaussian graphical models that incorporates prior information about the
underlying graph. In contrast to classical approaches that provide a point
estimate based on a maximum likelihood or a maximum a posteriori criterion
using (simple) priors on the precision matrix, we consider a prior on the graph
and rely on annealed Langevin diffusion to generate samples from the posterior
distribution. Since the Langevin sampler requires access to the score function
of the underlying graph prior, we use graph neural networks to effectively
estimate the score from a graph dataset (either available beforehand or
generated from a known distribution). Numerical experiments demonstrate the
benefits of our approach.
( 2
min )
Early diagnosis of Alzheimer Diagnostics (AD) is a challenging task due to
its subtle and complex clinical symptoms. Deep learning-assisted medical
diagnosis using image recognition techniques has become an important research
topic in this field. The features have to accurately capture main variations of
anatomical brain structures. However, time-consuming is expensive for feature
extraction by deep learning training. This study proposes a novel Alzheimer's
disease detection model based on Convolutional Neural Networks. The model
utilizes a pre-trained ResNet network as the backbone, incorporating
post-fusion algorithm for 3D medical images and attention mechanisms. The
experimental results indicate that the employed 2D fusion algorithm effectively
improves the model's training expense. And the introduced attention mechanism
accurately weights important regions in images, further enhancing the model's
diagnostic accuracy.
( 2
min )
Semantic segmentation enables robots to perceive and reason about their
environments beyond geometry. Most of such systems build upon deep learning
approaches. As autonomous robots are commonly deployed in initially unknown
environments, pre-training on static datasets cannot always capture the variety
of domains and limits the robot's perception performance during missions.
Recently, self-supervised and fully supervised active learning methods emerged
to improve a robot's vision. These approaches rely on large in-domain
pre-training datasets or require substantial human labelling effort. We propose
a planning method for semi-supervised active learning of semantic segmentation
that substantially reduces human labelling requirements compared to fully
supervised approaches. We leverage an adaptive map-based planner guided towards
the frontiers of unexplored space with high model uncertainty collecting
training data for human labelling. A key aspect of our approach is to combine
the sparse high-quality human labels with pseudo labels automatically extracted
from highly certain environment map areas. Experimental results show that our
method reaches segmentation performance close to fully supervised approaches
with drastically reduced human labelling effort while outperforming
self-supervised approaches.
( 2
min )
The majority of the research on the quantization of Deep Neural Networks
(DNNs) is focused on reducing the precision of tensors visible by high-level
frameworks (e.g., weights, activations, and gradients). However, current
hardware still relies on high-accuracy core operations. Most significant is the
operation of accumulating products. This high-precision accumulation operation
is gradually becoming the main computational bottleneck. This is because, so
far, the usage of low-precision accumulators led to a significant degradation
in performance. In this work, we present a simple method to train and fine-tune
high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits
accumulators, with no significant degradation in accuracy. Lastly, we show that
as we decrease the accumulation precision further, using fine-grained gradient
approximations can improve the DNN accuracy.
( 2
min )
K-fold cross-validation is a widely used tool for assessing classifier
performance. The reproducibility crisis faced by artificial intelligence partly
results from the irreproducibility of reported k-fold cross-validation-based
performance scores. Recently, we introduced numerical techniques to test the
consistency of claimed performance scores and experimental setups. In a crucial
use case, the method relies on the combinatorial enumeration of all k-fold
configurations, for which we proposed an algorithm in the binary classification
case.
( 2
min )
We consider the problem of learning linear operators under squared loss
between two infinite-dimensional Hilbert spaces in the online setting. We show
that the class of linear operators with uniformly bounded $p$-Schatten norm is
online learnable for any $p \in [1, \infty)$. On the other hand, we prove an
impossibility result by showing that the class of uniformly bounded linear
operators with respect to the operator norm is \textit{not} online learnable.
Moreover, we show a separation between sequential uniform convergence and
online learnability by identifying a class of bounded linear operators that is
online learnable but uniform convergence does not hold. Finally, we prove that
the impossibility result and the separation between uniform convergence and
learnability also hold in the batch setting.
( 2
min )
This paper considers stochastic weakly convex optimization without the
standard Lipschitz continuity assumption. Based on new adaptive regularization
(stepsize) strategies, we show that a wide class of stochastic algorithms,
including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 /
\sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on
rather weak assumptions: the Lipschitz parameter can be either bounded by a
general growth function of $\|x\|$ or locally estimated through independent
random samples.
( 2
min )
We propose two graph neural network layers for graphs with features in a
Riemannian manifold. First, based on a manifold-valued graph diffusion
equation, we construct a diffusion layer that can be applied to an arbitrary
number of nodes and graph connectivity patterns. Second, we model a tangent
multilayer perceptron by transferring ideas from the vector neuron framework to
our general setting. Both layers are equivariant with respect to node
permutations and isometries of the feature manifold. These properties have been
shown to lead to a beneficial inductive bias in many deep learning tasks.
Numerical examples on synthetic data as well as on triangle meshes of the right
hippocampus to classify Alzheimer's disease demonstrate the very good
performance of our layers.
( 2
min )
Implicit neural representations (INRs) are a rapidly growing research field,
which provides alternative ways to represent multimedia signals. Recent
applications of INRs include image super-resolution, compression of
high-dimensional signals, or 3D rendering. However, these solutions usually
focus on visual data, and adapting them to the audio domain is not trivial.
Moreover, it requires a separately trained model for every data sample. To
address this limitation, we propose HyperSound, a meta-learning method
leveraging hypernetworks to produce INRs for audio signals unseen at training
time. We show that our approach can reconstruct sound waves with quality
comparable to other state-of-the-art models.
( 2
min )
The recent advances in natural language processing have predominantly favored
well-resourced English-centric models, resulting in a significant gap with
low-resource languages. In this work, we introduce the language model TURNA,
which is developed for the low-resource language Turkish and is capable of both
natural language understanding and generation tasks. TURNA is pretrained with
an encoder-decoder architecture based on the unified framework UL2 with a
diverse corpus that we specifically curated for this purpose. We evaluated
TURNA with three generation tasks and five understanding tasks for Turkish. The
results show that TURNA outperforms several multilingual models in both
understanding and generation tasks, and competes with monolingual Turkish
models in understanding tasks. TURNA is made available at
https://huggingface.co/boun-tabi-LMG/TURNA .
( 2
min )
Machine learning typically presupposes classical probability theory which
implies that aggregation is built upon expectation. There are now multiple
reasons to motivate looking at richer alternatives to classical probability
theory as a mathematical foundation for machine learning. We systematically
examine a powerful and rich class of alternative aggregation functionals, known
variously as spectral risk measures, Choquet integrals or Lorentz norms. We
present a range of characterization results, and demonstrate what makes this
spectral family so special. In doing so we arrive at a natural stratification
of all coherent risk measures in terms of the upper probabilities that they
induce by exploiting results from the theory of rearrangement invariant Banach
spaces. We empirically demonstrate how this new approach to uncertainty helps
tackling practical machine learning problems.
( 2
min )
With the rapid advancement in cyber-physical systems, the increasing number
of sensors has significantly complicated manual monitoring of system states.
Consequently, graph-based time-series anomaly detection methods have gained
attention due to their ability to explicitly represent relationships between
sensors. However, these methods often apply a uniform source node
representation across all connected target nodes, even when updating different
target node representations. Moreover, the graph attention mechanism, commonly
used to infer unknown graph structures, could constrain the diversity of source
node representations. In this paper, we introduce the Edge Conditional
Node-update Graph Neural Network (ECNU-GNN). Our model, equipped with an edge
conditional node update module, dynamically transforms source node
representations based on connected edges to represent target nodes aptly. We
validate performance on three real-world datasets: SWaT, WADI, and PSM. Our
model demonstrates 5.4%, 12.4%, and 6.0% higher performance, respectively,
compared to best F1 baseline models.
( 2
min )
In this study, we explore the synergy of deep learning and financial market
applications, focusing on pair trading. This market-neutral strategy is
integral to quantitative finance and is apt for advanced deep-learning
techniques. A pivotal challenge in pair trading is discerning temporal
correlations among entities, necessitating the integration of diverse data
modalities. Addressing this, we introduce a novel framework, Multi-modal
Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and
discrete features into a temporal graph and employs a memory-based temporal
graph neural network. This approach reframes temporal correlation
identification as a temporal graph link prediction task, which has shown
empirical success. Our experiments on real-world datasets confirm the superior
performance of MTRGL, emphasizing its promise in refining automated pair
trading strategies.
( 2
min )
Federated Learning (FL) is a promising technique for the collaborative
training of deep neural networks across multiple devices while preserving data
privacy. Despite its potential benefits, FL is hindered by excessive
communication costs due to repeated server-client communication during
training. To address this challenge, model compression techniques, such as
sparsification and weight clustering are applied, which often require modifying
the underlying model aggregation schemes or involve cumbersome hyperparameter
tuning, with the latter not only adjusts the model's compression rate but also
limits model's potential for continuous improvement over growing data. In this
paper, we propose FedCompress, a novel approach that combines dynamic weight
clustering and server-side knowledge distillation to reduce communication costs
while learning highly generalizable models. Through a comprehensive evaluation
on diverse public datasets, we demonstrate the efficacy of our approach
compared to baselines in terms of communication costs and inference speed. We
will make our implementation public upon acceptance.
( 2
min )
We propose EEG-SimpleConv, a straightforward 1D convolutional neural network
for Motor Imagery decoding in BCI. Our main motivation is to propose a simple
and performing baseline to compare to, using only very standard ingredients
from the literature. We evaluate its performance on four EEG Motor Imagery
datasets, including simulated online setups, and compare it to recent Deep
Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or
far more efficient than other approaches, showing strong knowledge-transfer
capabilities across subjects, at the cost of a low inference time. We advocate
that using off-the-shelf ingredients rather than coming with ad-hoc solutions
can significantly help the adoption of Deep Learning approaches for BCI. We
make the code of the models and the experiments accessible.
( 2
min )
This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE)
serving system that realizes activation-aware expert offloading. MoE-Infinity
features sequence-level expert activation tracing, a new approach adept at
identifying sparse activations and capturing the temporal locality of MoE
inference. By analyzing these traces, MoE-Infinity performs novel
activation-aware expert prefetching and caching, substantially reducing the
latency overheads usually associated with offloading experts for improved cost
performance. Extensive experiments in a cluster show that MoE-Infinity
outperforms numerous existing systems and approaches, reducing latency by 4 -
20X and decreasing deployment costs by over 8X for various MoEs. MoE-Infinity's
source code is publicly available at https://github.com/TorchMoE/MoE-Infinity
( 2
min )
Thin-layer chromatography (TLC) is a crucial technique in molecular polarity
analysis. Despite its importance, the interpretability of predictive models for
TLC, especially those driven by artificial intelligence, remains a challenge.
Current approaches, utilizing either high-dimensional molecular fingerprints or
domain-knowledge-driven feature engineering, often face a dilemma between
expressiveness and interpretability. To bridge this gap, we introduce
Unsupervised Hierarchical Symbolic Regression (UHiSR), combining hierarchical
neural networks and symbolic regression. UHiSR automatically distills
chemical-intuitive polarity indices, and discovers interpretable equations that
link molecular structure to chromatographic behavior.
( 2
min )
This paper examines the use of deep recurrent neural networks to classify
traffic patterns in smart cities. We propose a novel approach to traffic
pattern classification based on deep recurrent neural networks, which can
effectively capture traffic patterns' dynamic and sequential features. The
proposed model combines convolutional and recurrent layers to extract features
from traffic pattern data and a SoftMax layer to classify traffic patterns.
Experimental results show that the proposed model outperforms existing methods
regarding accuracy, precision, recall, and F1 score. Furthermore, we provide an
in depth analysis of the results and discuss the implications of the proposed
model for smart cities. The results show that the proposed model can accurately
classify traffic patterns in smart cities with a precision of as high as 95%.
The proposed model is evaluated on a real world traffic pattern dataset and
compared with existing classification methods.
( 2
min )
In this paper, we describe the TTS models developed by NVIDIA for the
MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024
Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by
training additionally on 5 minutes of target speaker data. In Track 3, we
utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as
well as external datasets. We use HiFi-GAN vocoders for all submissions.
RAD-MMM performs competitively on Tracks 1 and 2, while P-Flow ranks first on
Track 3, with mean opinion score (MOS) 4.4 and speaker similarity score (SMOS)
of 3.62.
( 2
min )
The application of process mining for unstructured data might significantly
elevate novel insights into disciplines where unstructured data is a common
data format. To efficiently analyze unstructured data by process mining and to
convey confidence into the analysis result, requires bridging multiple
challenges. The purpose of this paper is to discuss these challenges, present
initial solutions and describe future research directions. We hope that this
article lays the foundations for future collaboration on this topic.
( 2
min )
We establish a layer-wise parameterization for 1D convolutional neural
networks (CNNs) with built-in end-to-end robustness guarantees. In doing so, we
use the Lipschitz constant of the input-output mapping characterized by a CNN
as a robustness measure. We base our parameterization on the Cayley transform
that parameterizes orthogonal matrices and the controllability Gramian of the
state space representation of the convolutional layers. The proposed
parameterization by design fulfills linear matrix inequalities that are
sufficient for Lipschitz continuity of the CNN, which further enables
unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train
Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and
show their improved robustness.
( 2
min )
Correlation clustering is a well-known unsupervised learning setting that
deals with positive and negative pairwise similarities. In this paper, we study
the case where the pairwise similarities are not given in advance and must be
queried in a cost-efficient way. Thereby, we develop a generic active learning
framework for this task that benefits from several advantages, e.g.,
flexibility in the type of feedback that a user/annotator can provide,
adaptation to any correlation clustering algorithm and query strategy, and
robustness to noise. In addition, we propose and analyze a number of novel
query strategies suited to this setting. We demonstrate the effectiveness of
our framework and the proposed query strategies via several experimental
studies.
( 2
min )
In this study, we examine the representation learning abilities of Denoising
Diffusion Models (DDM) that were originally purposed for image generation. Our
philosophy is to deconstruct a DDM, gradually transforming it into a classical
Denoising Autoencoder (DAE). This deconstructive procedure allows us to explore
how various components of modern DDMs influence self-supervised representation
learning. We observe that only a very few modern components are critical for
learning good representations, while many others are nonessential. Our study
ultimately arrives at an approach that is highly simplified and to a large
extent resembles a classical DAE. We hope our study will rekindle interest in a
family of classical methods within the realm of modern self-supervised
learning.
( 2
min )
The rapid development of large language models has revolutionized code
intelligence in software development. However, the predominance of
closed-source models has restricted extensive research and development. To
address this, we introduce the DeepSeek-Coder series, a range of open-source
code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion
tokens. These models are pre-trained on a high-quality project-level code
corpus and employ a fill-in-the-blank task with a 16K window to enhance code
generation and infilling. Our extensive evaluations demonstrate that
DeepSeek-Coder not only achieves state-of-the-art performance among open-source
code models across multiple benchmarks but also surpasses existing
closed-source models like Codex and GPT-3.5. Furthermore, DeepSeek-Coder models
are under a permissive license that allows for both research and unrestricted
commercial use.
( 2
min )
As the number of accepted papers at AI and ML conferences reaches into the
thousands, it has become unclear how researchers access and read research
publications. In this paper, we investigate the role of social media
influencers in enhancing the visibility of machine learning research,
particularly the citation counts of papers they share. We have compiled a
comprehensive dataset of over 8,000 papers, spanning tweets from December 2018
to October 2023, alongside 1:1 matched controls based on publication year,
venue, and abstract topics. Our analysis reveals a significant increase in
citations for papers endorsed by these influencers, with median citation counts
2-3 times higher than those of the control group. Additionally, the study
delves into the geographic, gender, and institutional diversity of highlighted
authors. These findings highlight the expanding influence of social media in
scholarly communication and underscore the importance of an evolving ecosystem
in today's digital academic landscape.
( 2
min )
We develop a novel multiple hypothesis testing correction with family-wise
error rate (FWER) control that efficiently exploits positive dependencies
between potentially correlated statistical hypothesis tests. Our proposed
algorithm $\texttt{max-rank}$ is conceptually straight-forward, relying on the
use of a $\max$-operator in the rank domain of computed test statistics. We
compare our approach to the frequently employed Bonferroni correction,
theoretically and empirically demonstrating its superiority over Bonferroni in
the case of existing positive dependency, and its equivalence otherwise. Our
advantage over Bonferroni increases as the number of tests rises, and we
maintain high statistical power whilst ensuring FWER control. We specifically
frame our algorithm in the context of parallel permutation testing, a scenario
that arises in our primary application of conformal prediction, a recently
popularized approach for quantifying uncertainty in complex predictive
settings.
( 2
min )
We establish a layer-wise parameterization for 1D convolutional neural
networks (CNNs) with built-in end-to-end robustness guarantees. In doing so, we
use the Lipschitz constant of the input-output mapping characterized by a CNN
as a robustness measure. We base our parameterization on the Cayley transform
that parameterizes orthogonal matrices and the controllability Gramian of the
state space representation of the convolutional layers. The proposed
parameterization by design fulfills linear matrix inequalities that are
sufficient for Lipschitz continuity of the CNN, which further enables
unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train
Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and
show their improved robustness.
( 2
min )
In this work we undertake a thorough study of the non-asymptotic properties
of the vanilla generative adversarial networks (GANs). We prove an oracle
inequality for the Jensen-Shannon (JS) divergence between the underlying
density $\mathsf{p}^*$ and the GAN estimate with a significantly better
statistical error term compared to the previously known results. The advantage
of our bound becomes clear in application to nonparametric density estimation.
We show that the JS-divergence between the GAN estimate and $\mathsf{p}^*$
decays as fast as $(\log{n}/n)^{2\beta/(2\beta + d)}$, where $n$ is the sample
size and $\beta$ determines the smoothness of $\mathsf{p}^*$. This rate of
convergence coincides (up to logarithmic factors) with minimax optimal for the
considered class of densities.
( 2
min )
We consider the problem of learning linear operators under squared loss
between two infinite-dimensional Hilbert spaces in the online setting. We show
that the class of linear operators with uniformly bounded $p$-Schatten norm is
online learnable for any $p \in [1, \infty)$. On the other hand, we prove an
impossibility result by showing that the class of uniformly bounded linear
operators with respect to the operator norm is \textit{not} online learnable.
Moreover, we show a separation between sequential uniform convergence and
online learnability by identifying a class of bounded linear operators that is
online learnable but uniform convergence does not hold. Finally, we prove that
the impossibility result and the separation between uniform convergence and
learnability also hold in the batch setting.
( 2
min )
Correlation clustering is a well-known unsupervised learning setting that
deals with positive and negative pairwise similarities. In this paper, we study
the case where the pairwise similarities are not given in advance and must be
queried in a cost-efficient way. Thereby, we develop a generic active learning
framework for this task that benefits from several advantages, e.g.,
flexibility in the type of feedback that a user/annotator can provide,
adaptation to any correlation clustering algorithm and query strategy, and
robustness to noise. In addition, we propose and analyze a number of novel
query strategies suited to this setting. We demonstrate the effectiveness of
our framework and the proposed query strategies via several experimental
studies.
( 2
min )
We propose a novel algorithm for the support estimation of partially known
Gaussian graphical models that incorporates prior information about the
underlying graph. In contrast to classical approaches that provide a point
estimate based on a maximum likelihood or a maximum a posteriori criterion
using (simple) priors on the precision matrix, we consider a prior on the graph
and rely on annealed Langevin diffusion to generate samples from the posterior
distribution. Since the Langevin sampler requires access to the score function
of the underlying graph prior, we use graph neural networks to effectively
estimate the score from a graph dataset (either available beforehand or
generated from a known distribution). Numerical experiments demonstrate the
benefits of our approach.
( 2
min )
This post provides three guided steps to architect risk management strategies while developing generative AI applications using LLMs. We first delve into the vulnerabilities, threats, and risks that arise from the implementation, deployment, and use of LLM solutions, and provide guidance on how to start innovating with security in mind. We then discuss how building on a secure foundation is essential for generative AI. Lastly, we connect these together with an example LLM workload to describe an approach towards architecting with defense-in-depth security across trust boundaries.
( 22
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Remotely sensed data are dominated by mixed Land Use and Land Cover (LULC)
types. Spectral unmixing (SU) is a key technique that disentangles mixed pixels
into constituent LULC types and their abundance fractions. While existing
studies on Deep Learning (DL) for SU typically focus on single time-step
hyperspectral (HS) or multispectral (MS) data, our work pioneers SU using MODIS
MS time series, addressing missing data with end-to-end DL models. Our approach
enhances a Long-Short Term Memory (LSTM)-based model by incorporating
geographic, topographic (geo-topographic), and climatic ancillary information.
Notably, our method eliminates the need for explicit endmember extraction,
instead learning the input-output relationship between mixed spectra and LULC
abundances through supervised learning. Experimental results demonstrate that
integrating spectral-temporal input data with geo-topographic and climatic
information significantly improves the estimation of LULC abundances in mixed
pixels. To facilitate this study, we curated a novel labeled dataset for
Andalusia (Spain) with monthly MODIS multispectral time series at 460m
resolution for 2013. Named Andalusia MultiSpectral MultiTemporal Unmixing
(Andalusia-MSMTU), this dataset provides pixel-level annotations of LULC
abundances along with ancillary information. The dataset
(https://zenodo.org/records/7752348) and code
(https://github.com/jrodriguezortega/MSMTU) are available to the public.
( 3
min )
In the field of clinical medicine, computed tomography (CT) is an effective
medical imaging modality for the diagnosis of various pathologies. Compared
with X-ray images, CT images can provide more information, including
multi-planar slices and three-dimensional structures for clinical diagnosis.
However, CT imaging requires patients to be exposed to large doses of ionizing
radiation for a long time, which may cause irreversible physical harm. In this
paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on
generated radiation fields. The network can learn a continuous representation
of CT projections from 2D X-ray images by obtaining the internal structure and
depth information and using adaptive loss weights to ensure the quality of the
generated images. Our model is trained on publicly available knee and chest
datasets, and we show the results of CT projection rendering with a single
X-ray and compare our method with other methods based on generated radiation
fields.
( 2
min )
We propose a decoder-only language model, VoxtLM, that can perform four
tasks: speech recognition, speech synthesis, text generation, and speech
continuation. VoxtLM integrates text vocabulary with discrete speech tokens
from self-supervised speech features and uses special tokens to enable
multitask learning. Compared to a single-task model, VoxtLM exhibits a
significant improvement in speech synthesis, with improvements in both speech
intelligibility from 28.9 to 5.6 and objective quality from 2.68 to 3.90.
VoxtLM also improves speech generation and speech recognition performance over
the single-task counterpart. Further, VoxtLM is trained with publicly available
data and training recipes and model checkpoints are open-sourced to make fully
reproducible work.
( 2
min )
Predicting next visit diagnosis using Electronic Health Records (EHR) is an
essential task in healthcare, critical for devising proactive future plans for
both healthcare providers and patients. Nonetheless, many preceding studies
have not sufficiently addressed the heterogeneous and hierarchical
characteristics inherent in EHR data, inevitably leading to sub-optimal
performance. To this end, we propose NECHO, a novel medical code-centric
multimodal contrastive EHR learning framework with hierarchical regularisation.
First, we integrate multifaceted information encompassing medical codes,
demographics, and clinical notes using a tailored network design and a pair of
bimodal contrastive losses, all of which pivot around a medical code
representation. We also regularise modality-specific encoders using a parental
level information in medical ontology to learn hierarchical structure of EHR
data. A series of experiments on MIMIC-III data demonstrates effectiveness of
our approach.
( 2
min )
With the rise in communication capacity, deep neural networks (DNN) for
digital pre-distortion (DPD) to correct non-linearity in wideband power
amplifiers (PAs) have become prominent. Yet, there is a void in open-source and
measurement-setup-independent platforms for fast DPD exploration and objective
DPD model comparison. This paper presents an open-source framework, OpenDPD,
crafted in PyTorch, with an associated dataset for PA modeling and DPD
learning. We introduce a Dense Gated Recurrent Unit (DGRU)-DPD, trained via a
novel end-to-end learning architecture, outperforming previous DPD models on a
digital PA (DPA) in the new digital transmitter (DTX) architecture with
unconventional transfer characteristics compared to analog PAs. Measurements
show our DGRU-DPD achieves an ACPR of -44.69/-44.47 dBc and an EVM of -35.22 dB
for 200 MHz OFDM signals. OpenDPD code, datasets, and documentation are
publicly available at https://github.com/lab-emi/OpenDPD.
( 2
min )
Self-supervised learning (SSL) has emerged as a promising paradigm for
learning flexible speech representations from unlabeled data. By designing
pretext tasks that exploit statistical regularities, SSL models can capture
useful representations that are transferable to downstream tasks. This study
provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired
by theories of redundancy reduction in human perception. On downstream tasks,
BT representations accelerated learning and transferred across domains.
However, limitations exist in disentangling key explanatory factors, with
redundancy reduction and invariance alone insufficient for factorization of
learned latents into modular, compact, and informative codes. Our ablations
study isolated gains from invariance constraints, but the gains were
context-dependent. Overall, this work substantiates the potential of Barlow
Twins for sample-efficient speech encoding. However, challenges remain in
achieving fully hierarchical representations. The analysis methodology and
insights pave a path for extensions incorporating further inductive priors and
perceptual principles to further enhance the BT self-supervision framework.
( 2
min )
In replay-based methods for continual learning, replaying input samples in
episodic memory has shown its effectiveness in alleviating catastrophic
forgetting. However, the potential key factor of cross-entropy loss with
softmax in causing catastrophic forgetting has been underexplored. In this
paper, we analyze the effect of softmax and revisit softmax masking with
negative infinity to shed light on its ability to mitigate catastrophic
forgetting. Based on the analyses, it is found that negative infinity masked
softmax is not always compatible with dark knowledge. To improve the
compatibility, we propose a general masked softmax that controls the stability
by adjusting the gradient scale to old and new classes. We demonstrate that
utilizing our method on other replay-based methods results in better
performance, primarily by enhancing model stability in continual learning
benchmarks, even when the buffer size is set to an extremely small value.
( 2
min )
Efficiently generating energetically stable crystal structures has long been
a challenge in material design, primarily due to the immense arrangement of
atoms in a crystal lattice. To facilitate the discovery of stable material, we
present a framework for the generation of synthesizable materials, leveraging a
point cloud representation to encode intricate structural information. At the
heart of this framework lies the introduction of a diffusion model as its
foundational pillar. To gauge the efficacy of our approach, we employ it to
reconstruct input structures from our training datasets, rigorously validating
its high reconstruction performance. Furthermore, we demonstrate the profound
potential of Point Cloud-Based Crystal Diffusion (PCCD) by generating entirely
new materials, emphasizing their synthesizability. Our research stands as a
noteworthy contribution to the advancement of materials design and synthesis
through the cutting-edge avenue of generative design instead of the
conventional substitution or experience-based discovery.
( 2
min )
We present a small study analyzing how prompt token classification loss
weighting (PLW) affects the performance of 7B-size LLaMA models fine-tuned on
instruction tasks. We recreated Stanford's Alpaca experiment with both LLaMA 1
and LLaMA 2 using multiple instruction datasets. We found that models
fine-tuned on our short-completion dataset have a negative quadratic
relationship with PLW while models fine-tuned on long-completion datasets were
unaffected by PLW.
( 2
min )
The Surrogate Modeling Toolbox (SMT) is an open-source Python package that
offers a collection of surrogate modeling methods, sampling techniques, and a
set of sample problems. This paper presents SMT 2.0, a major new release of SMT
that introduces significant upgrades and new features to the toolbox. This
release adds the capability to handle mixed-variable surrogate models and
hierarchical variables. These types of variables are becoming increasingly
important in several surrogate modeling applications. SMT 2.0 also improves SMT
by extending sampling methods, adding new surrogate models, and computing
variance and kernel derivatives for Kriging. This release also includes new
functions to handle noisy and use multifidelity data. To the best of our
knowledge, SMT 2.0 is the first open-source surrogate library to propose
surrogate models for hierarchical and mixed inputs. This open-source software
is distributed under the New BSD license.
( 3
min )
Recently, there has been a growing interest for mixed-categorical meta-models
based on Gaussian process (GP) surrogates. In this setting, several existing
approaches use different strategies either by using continuous kernels (e.g.,
continuous relaxation and Gower distance based GP) or by using a direct
estimation of the correlation matrix. In this paper, we present a kernel-based
approach that extends continuous exponential kernels to handle
mixed-categorical variables. The proposed kernel leads to a new GP surrogate
that generalizes both the continuous relaxation and the Gower distance based GP
models. We demonstrate, on both analytical and engineering problems, that our
proposed GP model gives a higher likelihood and a smaller residual error than
the other kernel-based state-of-the-art models. Our method is available in the
open-source software SMT.
( 2
min )
A major challenge in Natural Language Processing is obtaining annotated data
for supervised learning. An option is the use of crowdsourcing platforms for
data annotation. However, crowdsourcing introduces issues related to the
annotator's experience, consistency, and biases. An alternative is to use
zero-shot methods, which in turn have limitations compared to their few-shot or
fully supervised counterparts. Recent advancements driven by large language
models show potential, but struggle to adapt to specialized domains with
severely limited data. The most common approaches therefore involve the human
itself randomly annotating a set of datapoints to build initial datasets. But
randomly sampling data to be annotated is often inefficient as it ignores the
characteristics of the data and the specific needs of the model. The situation
worsens when working with imbalanced datasets, as random sampling tends to
heavily bias towards the majority classes, leading to excessive annotated data.
To address these issues, this paper contributes an automatic and informed data
selection architecture to build a small dataset for few-shot learning. Our
proposal minimizes the quantity and maximizes diversity of data selected for
human annotation, while improving model performance.
( 3
min )
This paper describes $\pi2\text{vec}$, a method for representing behaviors of
black box policies as feature vectors. The policy representations capture how
the statistics of foundation model features change in response to the policy
behavior in a task agnostic way, and can be trained from offline data, allowing
them to be used in offline policy selection. This work provides a key piece of
a recipe for fusing together three modern lines of research: Offline policy
evaluation as a counterpart to offline RL, foundation models as generic and
powerful state representations, and efficient policy selection in resource
constrained environments.
( 2
min )
In recent years, various powerful policy gradient algorithms have been
proposed in deep reinforcement learning. While all these algorithms build on
the Policy Gradient Theorem, the specific design choices differ significantly
across algorithms. We provide a holistic overview of on-policy policy gradient
algorithms to facilitate the understanding of both their theoretical
foundations and their practical implementations. In this overview, we include a
detailed proof of the continuous version of the Policy Gradient Theorem,
convergence results and a comprehensive discussion of practical algorithms. We
compare the most prominent algorithms on continuous control environments and
provide insights on the benefits of regularization. All code is available at
https://github.com/Matt00n/PolicyGradientsJax.
( 2
min )
Synthesizing performing guitar sound is a highly challenging task due to the
polyphony and high variability in expression. Recently, deep generative models
have shown promising results in synthesizing expressive polyphonic instrument
sounds from music scores, often using a generic MIDI input. In this work, we
propose an expressive acoustic guitar sound synthesis model with a customized
input representation to the instrument, which we call guitarroll. We implement
the proposed approach using diffusion-based outpainting which can generate
audio with long-term consistency. To overcome the lack of MIDI/audio-paired
datasets, we used not only an existing guitar dataset but also collected data
from a high quality sample-based guitar synthesizer. Through quantitative and
qualitative evaluations, we show that our proposed model has higher audio
quality than the baseline model and generates more realistic timbre sounds than
the previous leading work.
( 2
min )
In this paper, we present a novel approach for detecting the discontinuity
interfaces of a discontinuous function. This approach leverages Graph-Informed
Neural Networks (GINNs) and sparse grids to address discontinuity detection
also in domains of dimension larger than 3. GINNs, trained to identify troubled
points on sparse grids, exploit graph structures built on the grids to achieve
efficient and accurate discontinuity detection performances. We also introduce
a recursive algorithm for general sparse grid-based detectors, characterized by
convergence properties and easy applicability. Numerical experiments on
functions with dimensions n = 2 and n = 4 demonstrate the efficiency and robust
generalization of GINNs in detecting discontinuity interfaces. Notably, the
trained GINNs offer portability and versatility, allowing integration into
various algorithms and sharing among users.
( 2
min )
Cross-validation is a widely used technique for assessing the performance of
predictive models on unseen data. Many predictive models, such as Kernel-Based
Partial Least-Squares (PLS) models, require the computation of
$\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$
using only training set samples from the input and output matrices,
$\mathbf{X}$ and $\mathbf{Y}$, respectively. In this work, we present three
algorithms that efficiently compute these matrices. The first one allows no
column-wise preprocessing. The second one allows column-wise centering around
the training set means. The third one allows column-wise centering and
column-wise scaling around the training set means and standard deviations.
Demonstrating correctness and superior computational complexity, they offer
significant cross-validation speedup compared with straight-forward
cross-validation and previous work on fast cross-validation - all without data
leakage. Their suitability for parallelization is highlighted with an
open-source Python implementation combining our algorithms with Improved Kernel
PLS.
( 2
min )
A method for solving elasticity problems based on separable physics-informed
neural networks (SPINN) in conjunction with the deep energy method (DEM) is
presented. Numerical experiments have been carried out for a number of problems
showing that this method has a significantly higher convergence rate and
accuracy than the vanilla physics-informed neural networks (PINN) and even
SPINN based on a system of partial differential equations (PDEs). In addition,
using the SPINN in the framework of DEM approach it is possible to solve
problems of the linear theory of elasticity on complex geometries, which is
unachievable with the help of PINNs in frames of partial differential
equations. Considered problems are very close to the industrial problems in
terms of geometry, loading, and material parameters.
( 2
min )
We consider the ubiquitous linear inverse problems with additive Gaussian
noise and propose an unsupervised sampling approach called diffusion model
based posterior sampling (DMPS) to reconstruct the unknown signal from noisy
linear measurements. Specifically, using one diffusion model (DM) as an
implicit prior, the fundamental difficulty in performing posterior sampling is
that the noise-perturbed likelihood score, i.e., gradient of an annealed
likelihood function, is intractable. To circumvent this problem, we introduce a
simple yet effective closed-form approximation using an uninformative prior
assumption. Extensive experiments are conducted on a variety of noisy linear
inverse problems such as noisy super-resolution, denoising, deblurring, and
colorization. In all tasks, the proposed DMPS demonstrates highly competitive
or even better performances on various tasks while being 3 times faster than
the state-of-the-art competitor diffusion posterior sampling (DPS).
( 2
min )
In this work we demonstrate that significant gains in performance and data
efficiency can be achieved in High Energy Physics (HEP) by moving beyond the
standard paradigm of sequential optimization or reconstruction and analysis
components. We conceptually connect HEP reconstruction and analysis to modern
machine learning workflows such as pretraining, finetuning, domain adaptation
and high-dimensional embedding spaces and quantify the gains in the example
usecase of searches of heavy resonances decaying via an intermediate di-Higgs
system to four $b$-jets.
( 2
min )
Large language models (LLM) are generating information at a rapid pace,
requiring users to increasingly rely and trust the data. Despite remarkable
advances of LLM, Information generated by LLM is not completely trustworthy,
due to challenges in information quality. Specifically, integrity of
Information quality decreases due to unreliable, biased, tokenization during
pre-training of LLM. Moreover, due to decreased information quality issues, has
led towards hallucination, fabricated information. Unreliable information can
lead towards flawed decisions in businesses, which impacts economic activity.
In this work, we introduce novel mathematical information quality evaluation of
LLM, we furthermore analyze and highlight information quality challenges,
scaling laws to systematically scale language models.
( 2
min )
The SINDy algorithm has been successfully used to identify the governing
equations of dynamical systems from time series data. However, SINDy assumes
the user has prior knowledge of the variables in the system and of a function
library that can act as a basis for the system. In this paper, we demonstrate
on real world data how the Augmented SINDy algorithm outperforms SINDy in the
presence of system variable uncertainty. We then show SINDy can be further
augmented to perform robustly when both kinds of uncertainty are present.
( 2
min )
Emotion Recognition in Conversation (ERC) plays a crucial role in enabling
dialogue systems to effectively respond to user requests. The emotions in a
conversation can be identified by the representations from various modalities,
such as audio, visual, and text. However, due to the weak contribution of
non-verbal modalities to recognize emotions, multimodal ERC has always been
considered a challenging task. In this paper, we propose Teacher-leading
Multimodal fusion network for ERC (TelME). TelME incorporates cross-modal
knowledge distillation to transfer information from a language model acting as
the teacher to the non-verbal students, thereby optimizing the efficacy of the
weak modalities. We then combine multimodal features using a shifting fusion
approach in which student networks support the teacher. TelME achieves
state-of-the-art performance in MELD, a multi-speaker conversation dataset for
ERC. Finally, we demonstrate the effectiveness of our components through
additional experiments.
( 2
min )
Receiver operating characteristic (ROC) analysis is widely used for
evaluating diagnostic systems. Recent studies have shown that estimating an
area under ROC curve (AUC) with standard cross-validation methods suffers from
a large bias. The leave-pair-out (LPO) cross-validation has been shown to
correct this bias. However, while LPO produces an almost unbiased estimate of
AUC, it does not provide a ranking of the data needed for plotting and
analyzing the ROC curve. In this study, we propose a new method called
tournament leave-pair-out (TLPO) cross-validation. This method extends LPO by
creating a tournament from pair comparisons to produce a ranking for the data.
TLPO preserves the advantage of LPO for estimating AUC, while it also allows
performing ROC analyses. We have shown using both synthetic and real world data
that TLPO is as reliable as LPO for AUC estimation, and confirmed the bias in
leave-one-out cross-validation on low-dimensional data. As a case study on ROC
analysis, we also evaluate how reliably sensitivity and specificity can be
estimated from TLPO ROC curves.
( 2
min )
We consider the ubiquitous linear inverse problems with additive Gaussian
noise and propose an unsupervised sampling approach called diffusion model
based posterior sampling (DMPS) to reconstruct the unknown signal from noisy
linear measurements. Specifically, using one diffusion model (DM) as an
implicit prior, the fundamental difficulty in performing posterior sampling is
that the noise-perturbed likelihood score, i.e., gradient of an annealed
likelihood function, is intractable. To circumvent this problem, we introduce a
simple yet effective closed-form approximation using an uninformative prior
assumption. Extensive experiments are conducted on a variety of noisy linear
inverse problems such as noisy super-resolution, denoising, deblurring, and
colorization. In all tasks, the proposed DMPS demonstrates highly competitive
or even better performances on various tasks while being 3 times faster than
the state-of-the-art competitor diffusion posterior sampling (DPS).
( 2
min )
Recently, there has been a growing interest for mixed-categorical meta-models
based on Gaussian process (GP) surrogates. In this setting, several existing
approaches use different strategies either by using continuous kernels (e.g.,
continuous relaxation and Gower distance based GP) or by using a direct
estimation of the correlation matrix. In this paper, we present a kernel-based
approach that extends continuous exponential kernels to handle
mixed-categorical variables. The proposed kernel leads to a new GP surrogate
that generalizes both the continuous relaxation and the Gower distance based GP
models. We demonstrate, on both analytical and engineering problems, that our
proposed GP model gives a higher likelihood and a smaller residual error than
the other kernel-based state-of-the-art models. Our method is available in the
open-source software SMT.
( 2
min )
In causal inference with panel data under staggered adoption, the goal is to
estimate and derive confidence intervals for potential outcomes and treatment
effects. We propose a computationally efficient procedure, involving only
simple matrix algebra and singular value decomposition. We derive
non-asymptotic bounds on the entrywise error, establishing its proximity to a
suitably scaled Gaussian variable. Despite its simplicity, our procedure turns
out to be instance-optimal, in that our theoretical scaling matches a local
instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. Using
our insights, we develop a data-driven procedure for constructing entrywise
confidence intervals with pre-specified coverage guarantees. Our analysis is
based on a general inferential toolbox for the SVD algorithm applied to the
matrix denoising model, which might be of independent interest.
( 2
min )
In this post, we show you how to bring Amazon Q, your business expert, to users in Microsoft Teams. (If you use Slack, refer to Deploy a Slack gateway for Amazon Q, your business expert.) You’ll be able converse with Amazon Q business expert using Teams direct messages (DMs) to ask questions and get answers based on company data, get help creating new content such as email drafts, summarize attached files, and perform tasks.
( 10
min )
This GFN Thursday levels up PC gaming on mobile with higher-resolution support on Android devices. This week also brings 10 new games to the GeForce NOW library, including Enshrouded. Pixel Perfect GeForce NOW transforms nearly any device into a high-powered PC gaming rig, and members streaming on Android can now access that power from the Read article >
( 6
min )
This paper introduces an \textit{online bilevel optimization} setting in
which a sequence of time-varying bilevel problems are revealed one after the
other. We extend the known regret bounds for single-level online algorithms to
the bilevel setting. Specifically, we provide new notions of \textit{bilevel
regret}, develop an online alternating time-averaged gradient method that is
capable of leveraging smoothness, and give regret bounds in terms of the
path-length of the inner and outer minimizer sequences.
( 2
min )
The wave equation is an important physical partial differential equation, and
in recent years, deep learning has shown promise in accelerating or replacing
traditional numerical methods for solving it. However, existing deep learning
methods suffer from high data acquisition costs, low training efficiency, and
insufficient generalization capability for boundary conditions. To address
these issues, this paper proposes an unsupervised learning method for the wave
equation based on finite difference residual constraints. We construct a novel
finite difference residual constraint based on structured grids and finite
difference methods, as well as an unsupervised training strategy, enabling
convolutional neural networks to train without data and predict the forward
propagation process of waves. Experimental results show that finite difference
residual constraints have advantages over physics-informed neural networks
(PINNs) type physical information constraints, such as easier fitting, lower
computational costs, and stronger source term generalization capability, making
our method more efficient in training and potent in application.
( 2
min )
Register allocation is one of the most important problems for modern
compilers. With a practically unlimited number of user variables and a small
number of CPU registers, assigning variables to registers without conflicts is
a complex task. This work demonstrates the use of casting the register
allocation problem as a graph coloring problem. Using technologies such as
PyTorch and OpenAI Gymnasium Environments we will show that a Proximal Policy
Optimization model can learn to solve the graph coloring problem. We will also
show that the labeling of a graph is critical to the performance of the model
by taking the matrix representation of a graph and permuting it. We then test
the model's effectiveness on each of these permutations and show that it is not
effective when given a relabeling of the same graph. Our main contribution lies
in showing the need for label reordering invariant representations of graphs
for machine learning models to achieve consistent performance.
( 2
min )
The MagNet Challenge 2023 calls upon competitors to develop data-driven
models for the material-specific, waveform-agnostic estimation of steady-state
power losses in toroidal ferrite cores. The following HARDCORE (H-field and
power loss estimation for Arbitrary waveforms with Residual, Dilated
convolutional neural networks in ferrite COREs) approach shows that a residual
convolutional neural network with physics-informed extensions can serve this
task efficiently when trained on observational data beforehand. One key
solution element is an intermediate model layer which first reconstructs the bh
curve and then estimates the power losses based on the curve's area rendering
the proposed topology physically interpretable. In addition, emphasis was
placed on expert-based feature engineering and information-rich inputs in order
to enable a lean model architecture. A model is trained from scratch for each
material, while the topology remains the same. A Pareto-style trade-off between
model size and estimation accuracy is demonstrated, which yields an optimum at
as low as 1755 parameters and down to below 8\,\% for the 95-th percentile of
the relative error for the worst-case material with sufficient samples.
( 3
min )
This paper presents a {\delta}-PI algorithm which is based on damped Newton
method for the H{\infty} tracking control problem of unknown continuous-time
nonlinear system. A discounted performance function and an augmented system are
used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI
equation is a nonlinear partial differential equation, traditional
reinforcement learning methods for solving the tracking HJI equation are mostly
based on the Newton method, which usually only satisfies local convergence and
needs a good initial guess. Based upon the damped Newton iteration operator
equation, a generalized tracking Bellman equation is derived firstly. The
{\delta}-PI algorithm can seek the optimal solution of the tracking HJI
equation by iteratively solving the generalized tracking Bellman equation.
On-policy learning and off-policy learning {\delta}-PI reinforcement learning
methods are provided, respectively. Off-policy version {\delta}-PI algorithm is
a model-free algorithm which can be performed without making use of a priori
knowledge of the system dynamics. NN-based implementation scheme for the
off-policy {\delta}-PI algorithms is shown. The suitability of the model-free
{\delta}-PI algorithm is illustrated with a nonlinear system simulation.
( 2
min )
An innovative methodology that leverages artificial intelligence (AI) and
graph representation for semiconductor device encoding in TCAD device
simulation is proposed. A graph-based universal encoding scheme is presented
that not only considers material-level and device-level embeddings, but also
introduces a novel spatial relationship embedding inspired by interpolation
operations typically used in finite element meshing. Universal physical laws
from device simulations are leveraged for comprehensive data-driven modeling,
which encompasses surrogate Poisson emulation and current-voltage (IV)
prediction based on drift-diffusion model. Both are achieved using a novel
graph attention network, referred to as RelGAT. Comprehensive technical details
based on the device simulator Sentaurus TCAD are presented, empowering
researchers to adopt the proposed AI-driven Electronic Design Automation (EDA)
solution at the device level.
( 2
min )
Electrophysiological nature of neuronal networks allows to reveal various
interactions between different cell units at a very short time-scales. One of
the many challenges in analyzing these signals is to retrieve the morphology
and functionality of a given network. In this work we developed a computational
model, based on Reservoir Computing Network (RCN) architecture, which decodes
the spatio-temporal data from electro-physiological measurements of neuronal
cultures and reconstructs the network structure on a macroscopic domain,
representing the connectivity between neuronal units. We demonstrate that the
model can predict the connectivity map of the network with higher accuracy than
the common methods such as Cross-Correlation and Transfer-Entropy. In addition,
we experimentally demonstrate the ability of the model to predict a network
response to a specific input, such as localized stimulus.
( 2
min )
This paper proposes to develop a new variant of the two-time-scale stochastic
approximation to find the roots of two coupled nonlinear operators, assuming
only noisy samples of these operators can be observed. Our key idea is to
leverage the classic Ruppert-Polyak averaging technique to dynamically estimate
the operators through their samples. The estimated values of these averaging
steps will then be used in the two-time-scale stochastic approximation updates
to find the desired solution. Our main theoretical result is to show that under
the strongly monotone condition of the underlying nonlinear operators the
mean-squared errors of the iterates generated by the proposed method converge
to zero at an optimal rate $\mathcal{O}(1/k)$, where $k$ is the number of
iterations. Our result significantly improves the existing result of
two-time-scale stochastic approximation, where the best known finite-time
convergence rate is $\mathcal{O}(1/k^{2/3})$.
( 2
min )
We consider the design of fast and reliable neural network (NN)-based
approximations of traditional stabilizing controllers for linear systems with
polytopic uncertainty, including control laws with variable structure and those
based on a (minimal) selection policy. Building upon recent approaches for the
design of reliable control surrogates with guaranteed structural properties, we
develop a systematic procedure to certify the closed-loop stability and
performance of a linear uncertain system when a trained rectified linear unit
(ReLU)-based approximation replaces such traditional controllers. First, we
provide a sufficient condition, which involves the worst-case approximation
error between ReLU-based and traditional controller-based state-to-input
mappings, ensuring that the system is ultimately bounded within a set with
adjustable size and convergence rate. Then, we develop an offline,
mixed-integer optimization-based method that allows us to compute that quantity
exactly.
( 2
min )
Natural language processing has made progress in incorporating human context
into its models, but whether it is more effective to use group-wise attributes
(e.g., over-45-year-olds) or model individuals remains open. Group attributes
are technically easier but coarse: not all 45-year-olds write the same way. In
contrast, modeling individuals captures the complexity of each person's
identity. It allows for a more personalized representation, but we may have to
model an infinite number of users and require data that may be impossible to
get. We compare modeling human context via group attributes, individual users,
and combined approaches. Combining group and individual features significantly
benefits user-level regression tasks like age estimation or personality
assessment from a user's documents. Modeling individual users significantly
improves the performance of single document-level classification tasks like
stance and topic detection. We also find that individual-user modeling does
well even without user's historical data.
( 2
min )
Inspired by regularization techniques in statistics and machine learning, we
study complementary composite minimization in the stochastic setting. This
problem corresponds to the minimization of the sum of a (weakly) smooth
function endowed with a stochastic first-order oracle, and a structured
uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term.
Despite intensive work on closely related settings, prior to our work no
complexity bounds for this problem were known. We close this gap by providing
novel excess risk bounds, both in expectation and with high probability. Our
algorithms are nearly optimal, which we prove via novel lower complexity bounds
for this class of problems. We conclude by providing numerical results
comparing our methods to the state of the art.
( 2
min )
Reinforcement learning is an emerging approaches to facilitate multi-stage
sequential decision-making problems. This paper studies a real-time multi-stage
stochastic power dispatch considering multivariate uncertainties. Current
researches suffer from low generalization and practicality, that is, the
learned dispatch policy can only handle a specific dispatch scenario, its
performance degrades significantly if actual samples and training samples are
inconsistent. To fill these gaps, a novel contextual meta graph reinforcement
learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch
policy is proposed. Specifically, a more general contextual Markov decision
process (MDP) and scalable graph representation are introduced to achieve a
more generalized multi-stage stochastic power dispatch modeling. An upper
meta-learner is proposed to encode context for different dispatch scenarios and
learn how to achieve dispatch task identification while the lower policy
learner learns context-specified dispatch policy. After sufficient offline
learning, this approach can rapidly adapt to unseen and undefined scenarios
with only a few updations of the hypothesis judgments generated by the
meta-learner. Numerical comparisons with state-of-the-art policies and
traditional reinforcement learning verify the optimality, efficiency,
adaptability, and scalability of the proposed Meta-GRL.
( 2
min )
Accurate uncertainty measurement is a key step to building robust and
reliable machine learning systems. Conformal prediction is a distribution-free
uncertainty quantification algorithm popular for its ease of implementation,
statistical coverage guarantees, and versatility for underlying forecasters.
However, existing conformal prediction algorithms for time series are limited
to single-step prediction without considering the temporal dependency. In this
paper we propose a Copula Conformal Prediction algorithm for multivariate,
multi-step Time Series forecasting, CopulaCPTS. We prove that CopulaCPTS has
finite sample validity guarantee. On several synthetic and real-world
multivariate time series datasets, we show that CopulaCPTS produces more
calibrated and sharp confidence intervals for multi-step prediction tasks than
existing techniques.
( 2
min )
Deep learning (DL) is gaining popularity as a parameter estimation method for
quantitative MRI. A range of competing implementations have been proposed,
relying on either supervised or self-supervised learning. Self-supervised
approaches, sometimes referred to as unsupervised, have been loosely based on
auto-encoders, whereas supervised methods have, to date, been trained on
groundtruth labels. These two learning paradigms have been shown to have
distinct strengths. Notably, self-supervised approaches have offered lower-bias
parameter estimates than their supervised alternatives. This result is
counterintuitive - incorporating prior knowledge with supervised labels should,
in theory, lead to improved accuracy. In this work, we show that this apparent
limitation of supervised approaches stems from the naive choice of groundtruth
training labels. By training on labels which are deliberately not groundtruth,
we show that the low-bias parameter estimation previously associated with
self-supervised methods can be replicated - and improved on - within a
supervised learning framework. This approach sets the stage for a single,
unifying, deep learning parameter estimation framework, based on supervised
learning, where trade-offs between bias and variance are made by careful
adjustment of training label.
( 3
min )
Literature-Based Discovery (LBD) aims to discover new scientific knowledge by
mining papers and generating hypotheses. Standard LBD is limited to predicting
pairwise relations between discrete concepts (e.g., drug-disease links), and
ignores critical contexts like experimental settings (e.g., a specific patient
population where a drug is evaluated) and background motivations (e.g., to find
drugs without specific side effects). We address these limitations with a novel
formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in
natural language, while grounding them in a context that controls the
hypothesis search space. We present a modeling framework using retrieval of
``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4
tends to generate ideas with overall low technical depth and novelty, while our
inspiration prompting approaches partially mitigate this issue. Our work
represents a first step toward building language models that generate new ideas
derived from scientific literature.
( 2
min )
We introduce and investigate the iterated application of Generalized Matrix
Learning Vector Quantizaton for the analysis of feature relevances in
classification problems, as well as for the construction of
class-discriminative subspaces. The suggested Iterated Relevance Matrix
Analysis (IRMA) identifies a linear subspace representing the classification
specific information of the considered data sets using Generalized Matrix
Learning Vector Quantization (GMLVQ). By iteratively determining a new
discriminative subspace while projecting out all previously identified ones, a
combined subspace carrying all class-specific information can be found. This
facilitates a detailed analysis of feature relevances, and enables improved
low-dimensional representations and visualizations of labeled data sets.
Additionally, the IRMA-based class-discriminative subspace can be used for
dimensionality reduction and the training of robust classifiers with
potentially improved performance.
( 2
min )
In this work, we compare emergent communication (EC) built upon multi-agent
deep reinforcement learning (MADRL) and language-oriented semantic
communication (LSC) empowered by a pre-trained large language model (LLM) using
human language. In a multi-agent remote navigation task, with multimodal input
data comprising location and channel maps, it is shown that EC incurs high
training cost and struggles when using multimodal data, whereas LSC yields high
inference computing cost due to the LLM's large size. To address their
respective bottlenecks, we propose a novel framework of language-guided EC
(LEC) by guiding the EC training using LSC via knowledge distillation (KD).
Simulations corroborate that LEC achieves faster travel time while avoiding
areas with poor channel conditions, as well as speeding up the MADRL training
convergence by up to 61.8% compared to EC.
( 2
min )
As social media platforms are evolving from text-based forums into
multi-modal environments, the nature of misinformation in social media is also
transforming accordingly. Taking advantage of the fact that visual modalities
such as images and videos are more favorable and attractive to the users and
textual contents are sometimes skimmed carelessly, misinformation spreaders
have recently targeted contextual connections between the modalities e.g., text
and image. Hence many researchers have developed automatic techniques for
detecting possible cross-modal discordance in web-based content. We analyze,
categorize and identify existing approaches in addition to challenges and
shortcomings they face in order to unearth new research opportunities in the
field of multi-modal misinformation detection.
( 2
min )
Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is
crucial for understanding tumor growth dynamics and designing personalized
radiotherapy treatment plans.Mathematical models of GBM growth can complement
the data in the prediction of spatial distributions of tumor cells. However,
this requires estimating patient-specific parameters of the model from clinical
data, which is a challenging inverse problem due to limited temporal data and
the limited time between imaging and diagnosis. This work proposes a method
that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific
parameters of a reaction-diffusion PDE model of GBM growth from a single 3D
structural MRI snapshot. PINNs embed both the data and the PDE into a loss
function, thus integrating theory and data. Key innovations include the
identification and estimation of characteristic non-dimensional parameters, a
pre-training step that utilizes the non-dimensional parameters and a
fine-tuning step to determine the patient specific parameters. Additionally,
the diffuse domain method is employed to handle the complex brain geometry
within the PINN framework. Our method is validated both on synthetic and
patient datasets, and shows promise for real-time parametric inference in the
clinical setting for personalized GBM treatment.
( 3
min )
Recently how to introduce large amounts of unlabeled facial images in the
wild into supervised Facial Action Unit (AU) detection frameworks has become a
challenging problem. In this paper, we propose a new AU detection framework
where multi-task learning is introduced to jointly learn AU domain separation
and reconstruction and facial landmark detection by sharing the parameters of
homostructural facial extraction modules. In addition, we propose a new feature
alignment scheme based on contrastive learning by simple projectors and an
improved contrastive loss, which adds four additional intermediate supervisors
to promote the feature reconstruction process. Experimental results on two
benchmarks demonstrate our superiority against the state-of-the-art methods for
AU detection in the wild.
( 2
min )
Graph-based collaborative filtering methods have prevailing performance for
recommender systems since they can capture high-order information between users
and items, in which the graphs are constructed from the observed user-item
interactions that might miss links or contain spurious positive interactions in
industrial scenarios. The Bayesian Graph Neural Network framework approaches
this issue with generative models for the interaction graphs. The critical
problem is to devise a proper family of graph generative models tailored to
recommender systems. We propose an efficient generative model that jointly
considers the preferences of users, the concurrence of items and some important
graph structure information. Experiments on four popular benchmark datasets
demonstrate the effectiveness of our proposed graph generative methods for
recommender systems.
( 2
min )
Phosphorus removal is vital in wastewater treatment to reduce reliance on
limited resources. Deep reinforcement learning (DRL) is a machine learning
technique that can optimize complex and nonlinear systems, including the
processes in wastewater treatment plants, by learning control policies through
trial and error. However, applying DRL to chemical and biological processes is
challenging due to the need for accurate simulators. This study trained six
models to identify the phosphorus removal process and used them to create a
simulator for the DRL environment. Although the models achieved high accuracy
(>97%), uncertainty and incorrect prediction behavior limited their performance
as simulators over longer horizons. Compounding errors in the models'
predictions were identified as one of the causes of this problem. This approach
for improving process control involves creating simulation environments for DRL
algorithms, using data from supervisory control and data acquisition (SCADA)
systems with a sufficient historical horizon without complex system modeling or
parameter estimation.
( 2
min )
Deep learning models have become increasingly popular for flood prediction
due to their superior accuracy and efficiency compared to traditional methods.
However, current machine learning methods often rely on separate spatial or
temporal feature analysis and have limitations on the types, number, and
dimensions of input data. This study presented a CNN-RNN hybrid feature fusion
modelling approach for urban flood prediction, which integrated the strengths
of CNNs in processing spatial features and RNNs in analyzing different
dimensions of time sequences. This approach allowed for both static and dynamic
flood predictions. Bayesian optimization was applied to identify the seven most
influential flood-driven factors and determine the best combination strategy.
By combining four CNNs (FCN, UNet, SegNet, DeepLabv3+) and three RNNs (LSTM,
BiLSTM, GRU), the optimal hybrid model was identified as LSTM-DeepLabv3+. This
model achieved the highest prediction accuracy (MAE, RMSE, NSE, and KGE were
0.007, 0.025, 0.973 and 0.755, respectively) under various rainfall input
conditions. Additionally, the processing speed was significantly improved, with
an inference time of 1.158s (approximately 1/125 of the traditional computation
time) compared to the physically-based models.
( 2
min )
In the era of the Internet of Things (IoT), decentralized paradigms for
machine learning are gaining prominence. In this paper, we introduce a
federated learning model that capitalizes on the Euclidean distance between
device model weights to assess their similarity and disparity. This is
foundational for our system, directing the formation of coalitions among
devices based on the closeness of their model weights. Furthermore, the concept
of a barycenter, representing the average of model weights, helps in the
aggregation of updates from multiple devices. We evaluate our approach using
homogeneous and heterogeneous data distribution, comparing it against
traditional federated learning averaging algorithm. Numerical results
demonstrate its potential in offering structured, outperformed and
communication-efficient model for IoT-based machine learning.
( 2
min )
We develop a versatile framework for statistical learning in non-stationary
environments. In each time period, our approach applies a stability principle
to select a look-back window that maximizes the utilization of historical data
while keeping the cumulative bias within an acceptable range relative to the
stochastic error. Our theory showcases the adaptability of this approach to
unknown non-stationarity. The regret bound is minimax optimal up to logarithmic
factors when the population losses are strongly convex, or Lipschitz only. At
the heart of our analysis lie two novel components: a measure of similarity
between functions and a segmentation technique for dividing the non-stationary
data sequence into quasi-stationary pieces.
( 2
min )
Neural Architecture Search (NAS) has become the de-facto approach for
designing accurate and efficient networks for edge devices. Since models are
typically quantized for edge deployment, recent work has investigated
quantization-aware NAS (QA-NAS) to search for highly accurate and efficient
quantized models. However, existing QA-NAS approaches, particularly few-bit
mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently,
QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this
work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale
tasks by leveraging the block-wise formulation introduced by block-wise NAS. We
demonstrate strong results for the semantic segmentation task on the Cityscapes
dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than
DeepLabV3 (INT8) without compromising task performance.
( 2
min )
In this paper, we present a novel bilevel optimization-based training
approach to training acoustic models for automatic speech recognition (ASR)
tasks that we term {bi-level joint unsupervised and supervised training
(BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an
unsupervised loss and a supervised loss respectively, leveraging recent
advances in penalty-based bilevel optimization to solve this challenging ASR
problem with affordable complexity and rigorous convergence guarantees.} To
evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2
datasets have been conducted. BL-JUST achieves superior performance over the
commonly used pre-training followed by fine-tuning strategy.
( 2
min )
In this study, we investigated the potential of GPT-3 for the anti-cancer
drug sensitivity prediction task using structured pharmacogenomics data across
five tissue types and evaluated its performance with zero-shot prompting and
fine-tuning paradigms. The drug's smile representation and cell line's genomic
mutation features were predictive of the drug response. The results from this
study have the potential to pave the way for designing more efficient treatment
protocols in precision oncology.
( 2
min )
Interactive Machine Learning (IML) seeks to integrate human expertise into
machine learning processes. However, most existing algorithms cannot be applied
to Realworld Scenarios because their state spaces and/or action spaces are
limited to discrete values. Furthermore, the interaction of all existing
methods is restricted to deciding between multiple proposals. We therefore
propose a novel framework based on Bayesian Optimization (BO). Interactive
Bayesian Optimization (IBO) enables collaboration between machine learning
algorithms and humans. This framework captures user preferences and provides an
interface for users to shape the strategy by hand. Additionally, we've
incorporated a new acquisition function, Preference Expected Improvement (PEI),
to refine the system's efficiency using a probabilistic model of the user
preferences. Our approach is geared towards ensuring that machines can benefit
from human expertise, aiming for a more aligned and effective learning process.
In the course of this work, we applied our method to simulations and in a real
world task using a Franka Panda robot to show human-robot collaboration.
( 2
min )
The aim of this work is to create and apply a methodological approach for
predicting gas traps from 3D seismic data and gas well testing. The paper
formalizes the approach to creating a training dataset by selecting volumes
with established gas saturation and filtration properties within the seismic
wavefield. The training dataset thus created is used in a process stack of
sequential application of data processing methods and ensemble machine learning
algorithms. As a result, a cube of calibrated probabilities of belonging of the
study space to gas reservoirs was obtained. The high efficiency of this
approach is shown on a delayed test sample of three wells (blind wells). The
final value of the gas reservoir prediction quality metric f1 score was
0.893846.
( 2
min )
Urban region embedding is an important and yet highly challenging issue due
to the complexity and constantly changing nature of urban data. To address the
challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER)
to capture multi-view dependencies and learn expressive representations of
urban regions without the constraints of rigid neighbourhood region conditions.
Our model focus on learn urban region representation from multi-source urban
data. First, we capture the multi-view correlations from mobility flow
patterns, POI semantics and check-in dynamics. Then, we adopt global graph
attention networks to learn similarity of any two vertices in graphs. To
comprehensively consider and share features of multiple views, a two-stage
fusion module is further proposed to learn weights with external attention to
fuse multi-view embeddings. Extensive experiments for two downstream tasks on
real-world datasets demonstrate that our model outperforms state-of-the-art
methods by up to 17\% improvement.
( 2
min )
Bayesian Neural Networks (BayNNs) naturally provide uncertainty in their
predictions, making them a suitable choice in safety-critical applications.
Additionally, their realization using memristor-based in-memory computing (IMC)
architectures enables them for resource-constrained edge applications. In
addition to predictive uncertainty, however, the ability to be inherently
robust to noise in computation is also essential to ensure functional safety.
In particular, memristor-based IMCs are susceptible to various sources of
non-idealities such as manufacturing and runtime variations, drift, and
failure, which can significantly reduce inference accuracy. In this paper, we
propose a method to inherently enhance the robustness and inference accuracy of
BayNNs deployed in IMC architectures. To achieve this, we introduce a novel
normalization layer combined with stochastic affine transformations. Empirical
results in various benchmark datasets show a graceful degradation in inference
accuracy, with an improvement of up to $58.11\%$.
( 2
min )
In this paper, we explore low-power custom quantised Multi-Layer Perceptrons
(MLPs) as an Intrusion Detection System (IDS) for automotive controller area
network (CAN). We utilise the FINN framework from AMD/Xilinx to quantise, train
and generate hardware IP of our MLP to detect denial of service (DoS) and
fuzzying attacks on CAN network, using ZCU104 (XCZU7EV) FPGA as our target ECU
architecture with integrated IDS capabilities. Our approach achieves
significant improvements in latency (0.12 ms per-message processing latency)
and inference energy consumption (0.25 mJ per inference) while achieving
similar classification performance as state-of-the-art approaches in the
literature.
( 2
min )
We argue that insurance can act as an analogon for the social situatedness of
machine learning systems, hence allowing machine learning scholars to take
insights from the rich and interdisciplinary insurance literature. Tracing the
interaction of uncertainty, fairness and responsibility in insurance provides a
fresh perspective on fairness in machine learning. We link insurance fairness
conceptions to their machine learning relatives, and use this bridge to
problematize fairness as calibration. In this process, we bring to the
forefront two themes that have been largely overlooked in the machine learning
literature: responsibility and aggregate-individual tensions.
( 2
min )
We present the first mini-batch algorithm for maximizing a non-negative
monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of
constraints. We improve over the sparsifier based approach both in theory and
in practice. We experimentally observe that our algorithm generates solutions
that are far superior to those generated by the sparsifier based approach.
( 2
min )
We respond to the recent paper by Makelov et al. (2023), which reviews
subspace interchange intervention methods like distributed alignment search
(DAS; Geiger et al. 2023) and claims that these methods potentially cause
"interpretability illusions". We first review Makelov et al. (2023)'s technical
notion of what an "interpretability illusion" is, and then we show that even
intuitive and desirable explanations can qualify as illusions in this sense. As
a result, their method of discovering "illusions" can reject explanations they
consider "non-illusory". We then argue that the illusions Makelov et al. (2023)
see in practice are artifacts of their training and evaluation paradigms. We
close by emphasizing that, though we disagree with their core characterization,
Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the
field of interpretability forward.
( 2
min )
Sound event localization and detection (SELD) is an important task in machine
listening. Major advancements rely on simulated data with sound events in
specific rooms and strong spatio-temporal labels. SELD data is simulated by
convolving spatialy-localized room impulse responses (RIRs) with sound
waveforms to place sound events in a soundscape. However, RIRs require manual
collection in specific rooms. We present SpatialScaper, a library for SELD data
simulation and augmentation. Compared to existing tools, SpatialScaper emulates
virtual rooms via parameters such as size and wall absorption. This allows for
parameterized placement (including movement) of foreground and background sound
sources. SpatialScaper also includes data augmentation pipelines that can be
applied to existing SELD data. As a case study, we use SpatialScaper to add
rooms to the DCASE SELD data. Training a model with our data led to progressive
performance improves as a direct function of acoustic diversity. These results
show that SpatialScaper is valuable to train robust SELD models.
( 2
min )
Transfer learning for nonparametric regression is considered. We first study
the non-asymptotic minimax risk for this problem and develop a novel estimator
called the confidence thresholding estimator, which is shown to achieve the
minimax optimal risk up to a logarithmic factor. Our results demonstrate two
unique phenomena in transfer learning: auto-smoothing and super-acceleration,
which differentiate it from nonparametric regression in a traditional setting.
We then propose a data-driven algorithm that adaptively achieves the minimax
risk up to a logarithmic factor across a wide range of parameter spaces.
Simulation studies are conducted to evaluate the numerical performance of the
adaptive transfer learning algorithm, and a real-world example is provided to
demonstrate the benefits of the proposed method.
( 2
min )
In the modern transportation industry, accurate prediction of travelers' next
destinations brings multiple benefits to companies, such as customer
satisfaction and targeted marketing. This study focuses on developing a precise
model that captures the sequential patterns and dependencies in travel data,
enabling accurate predictions of individual travelers' future destinations. To
achieve this, a novel model architecture with a sliding window approach based
on Long Short-Term Memory (LSTM) is proposed for destination prediction in the
transportation industry. The experimental results highlight satisfactory
performance and high scores achieved by the proposed model across different
data sizes and performance metrics. This research contributes to advancing
destination prediction methods, empowering companies to deliver personalized
recommendations and optimize customer experiences in the dynamic travel
landscape.
( 2
min )
Object Detection (OD) has proven to be a significant computer vision method
in extracting localized class information and has multiple applications in the
industry. Although many of the state-of-the-art (SOTA) OD models perform well
on medium and large sized objects, they seem to under perform on small objects.
In most of the industrial use cases, it is difficult to collect and annotate
data for small objects, as it is time-consuming and prone to human errors.
Additionally, those datasets are likely to be unbalanced and often result in an
inefficient model convergence. To tackle this challenge, this study presents a
novel approach that injects additional data points to improve the performance
of the OD models. Using synthetic data generation, the difficulties in data
collection and annotations for small object data points can be minimized and to
create a dataset with balanced distribution. This paper discusses the effects
of a simple proportional class-balancing technique, to enable better anchor
matching of the OD models. A comparison was carried out on the performances of
the SOTA OD models: YOLOv5, YOLOv7 and SSD, for combinations of real and
synthetic datasets within an industrial use case.
( 3
min )
We report results of a longitudinal sentiment classification of Reddit posts
written by students of four major Canadian universities. We work with the texts
of the posts, concentrating on the years 2020-2023. By finely tuning a
sentiment threshold to a range of [-0.075,0.075], we successfully built
classifiers proficient in categorizing post sentiments into positive and
negative categories. Noticeably, our sentiment classification results are
consistent across the four university data sets.
( 2
min )
We consider a regularized expected reward optimization problem in the
non-oblivious setting that covers many existing problems in reinforcement
learning (RL). In order to solve such an optimization problem, we apply and
analyze the classical stochastic proximal gradient method. In particular, the
method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an
$\epsilon$-stationary point, under standard conditions. Since the variance of
the classical stochastic gradient estimator is typically large which slows down
the convergence, we also apply an efficient stochastic variance-reduce proximal
gradient method with an importance sampling based ProbAbilistic Gradient
Estimator (PAGE). To the best of our knowledge, the application of this method
represents a novel approach in addressing the general regularized reward
optimization problem. Our analysis shows that the sample complexity can be
improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional
conditions. Our results on the stochastic (variance-reduced) proximal gradient
method match the sample complexity of their most competitive counterparts under
similar settings in the RL literature.
( 2
min )
In continual learning, catastrophic forgetting is affected by multiple
aspects of the tasks. Previous works have analyzed separately how forgetting is
affected by either task similarity or overparameterization. In contrast, our
paper examines how task similarity and overparameterization jointly affect
forgetting in an analyzable model. Specifically, we focus on two-task continual
linear regression, where the second task is a random orthogonal transformation
of an arbitrary first task (an abstraction of random permutation tasks). We
derive an exact analytical expression for the expected forgetting - and uncover
a nuanced pattern. In highly overparameterized models, intermediate task
similarity causes the most forgetting. However, near the interpolation
threshold, forgetting decreases monotonically with the expected task
similarity. We validate our findings with linear regression on synthetic data,
and with neural networks on established permutation task benchmarks.
( 2
min )
This paper introduces SpecInfer, a system that accelerates generative large
language model (LLM) serving with tree-based speculative inference and
verification. The key idea behind SpecInfer is leveraging small speculative
models to predict the LLM's outputs; the predictions are organized as a token
tree, whose nodes each represent a candidate token sequence. The correctness of
all candidate token sequences represented by a token tree is verified against
the LLM in parallel using a novel tree-based parallel decoding mechanism.
SpecInfer uses an LLM as a token tree verifier instead of an incremental
decoder, which significantly reduces the end-to-end latency and computational
requirement for serving generative LLMs while provably preserving model
quality. Our evaluation shows that SpecInfer outperforms existing LLM serving
systems by 1.5-2.8x for distributed LLM inference and by 2.6-3.5x for
offloading-based LLM inference, while preserving the same generative
performance. SpecInfer is publicly available at
https://github.com/flexflow/FlexFlow/
( 2
min )
In this paper, we investigate the intersection of large generative AI models
and cloud-native computing architectures. Recent large models such as ChatGPT,
while revolutionary in their capabilities, face challenges like escalating
costs and demand for high-end GPUs. Drawing analogies between
large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we
describe an AI-native computing paradigm that harnesses the power of both
cloud-native technologies (e.g., multi-tenancy and serverless computing) and
advanced machine learning runtime (e.g., batched LoRA inference). These joint
efforts aim to optimize costs-of-goods-sold (COGS) and improve resource
accessibility. The journey of merging these two domains is just at the
beginning and we hope to stimulate future research and development in this
area.
( 2
min )
We propose an approach for curating multimodal data that we used for our
entry in the 2023 DataComp competition filtering track. Our technique combines
object detection and weak supervision-based ensembling. In the first of two
steps in our approach, we employ an out-of-the-box zero-shot object detection
model to extract granular information and produce a variety of filter designs.
In the second step, we employ weak supervision to ensemble filtering rules.
This approach results in a 4% performance improvement when compared to the
best-performing baseline, producing the top-ranking position in the small scale
track at the time of writing. Furthermore, in the medium scale track, we
achieve a noteworthy 4.2% improvement over the baseline by simply ensembling
existing baselines with weak supervision.
( 2
min )
Most existing neural network-based approaches for solving stochastic optimal
control problems using the associated backward dynamic programming principle
rely on the ability to simulate the underlying state variables. However, in
some problems, this simulation is infeasible, leading to the discretization of
state variable space and the need to train one neural network for each data
point. This approach becomes computationally inefficient when dealing with
large state variable spaces. In this paper, we consider a class of this type of
stochastic optimal control problems and introduce an effective solution
employing multitask neural networks. To train our multitask neural network, we
introduce a novel scheme that dynamically balances the learning across tasks.
Through numerical experiments on real-world derivatives pricing problems, we
prove that our method outperforms state-of-the-art approaches.
( 2
min )
In this study, we introduce Orion-14B, a collection of multilingual large
language models with 14 billion parameters. We utilize a data scheduling
approach to train a foundational model on a diverse corpus of 2.5 trillion
tokens, sourced from texts in English, Chinese, Japanese, Korean, and other
languages. Additionally, we fine-tuned a series of models tailored for
conversational applications and other specific use cases. Our evaluation
results demonstrate that Orion-14B achieves state-of-the-art performance across
a broad spectrum of tasks. We make the Orion-14B model family and its
associated code publicly accessible https://github.com/OrionStarAI/Orion,
aiming to inspire future research and practical applications in the field.
( 2
min )
Most existing neural network-based approaches for solving stochastic optimal
control problems using the associated backward dynamic programming principle
rely on the ability to simulate the underlying state variables. However, in
some problems, this simulation is infeasible, leading to the discretization of
state variable space and the need to train one neural network for each data
point. This approach becomes computationally inefficient when dealing with
large state variable spaces. In this paper, we consider a class of this type of
stochastic optimal control problems and introduce an effective solution
employing multitask neural networks. To train our multitask neural network, we
introduce a novel scheme that dynamically balances the learning across tasks.
Through numerical experiments on real-world derivatives pricing problems, we
prove that our method outperforms state-of-the-art approaches.
( 2
min )
In this paper, we present a novel bilevel optimization-based training
approach to training acoustic models for automatic speech recognition (ASR)
tasks that we term {bi-level joint unsupervised and supervised training
(BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an
unsupervised loss and a supervised loss respectively, leveraging recent
advances in penalty-based bilevel optimization to solve this challenging ASR
problem with affordable complexity and rigorous convergence guarantees.} To
evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2
datasets have been conducted. BL-JUST achieves superior performance over the
commonly used pre-training followed by fine-tuning strategy.
( 2
min )
Transfer learning for nonparametric regression is considered. We first study
the non-asymptotic minimax risk for this problem and develop a novel estimator
called the confidence thresholding estimator, which is shown to achieve the
minimax optimal risk up to a logarithmic factor. Our results demonstrate two
unique phenomena in transfer learning: auto-smoothing and super-acceleration,
which differentiate it from nonparametric regression in a traditional setting.
We then propose a data-driven algorithm that adaptively achieves the minimax
risk up to a logarithmic factor across a wide range of parameter spaces.
Simulation studies are conducted to evaluate the numerical performance of the
adaptive transfer learning algorithm, and a real-world example is provided to
demonstrate the benefits of the proposed method.
( 2
min )
We develop a versatile framework for statistical learning in non-stationary
environments. In each time period, our approach applies a stability principle
to select a look-back window that maximizes the utilization of historical data
while keeping the cumulative bias within an acceptable range relative to the
stochastic error. Our theory showcases the adaptability of this approach to
unknown non-stationarity. The regret bound is minimax optimal up to logarithmic
factors when the population losses are strongly convex, or Lipschitz only. At
the heart of our analysis lie two novel components: a measure of similarity
between functions and a segmentation technique for dividing the non-stationary
data sequence into quasi-stationary pieces.
( 2
min )
This post discusses how enterprises can build accurate, transparent, and secure generative AI applications while keeping full control over proprietary data. The proposed solution is a RAG pipeline using an AI-native technology stack, whose components are designed from the ground up with AI at their core, rather than having AI capabilities added as an afterthought. We demonstrate how to build an end-to-end RAG application using Cohere’s language models through Amazon Bedrock and a Weaviate vector database on AWS Marketplace.
( 13
min )
In a major stride toward building a shared national research infrastructure, the U.S. National Science Foundation has launched the National Artificial Intelligence Research Resource pilot program with significant support from NVIDIA. The initiative aims to broaden access to the tools needed to power responsible AI discovery and innovation. It was announced Wednesday in partnership with Read article >
( 7
min )
RTX Video HDR — first announced at CES — is now available for download through the January Studio Driver.
( 8
min )
Interatomic potentials learned using machine learning methods have been
successfully applied to atomistic simulations. However, accurate models require
large training datasets, while generating reference calculations is
computationally demanding. To bypass this difficulty, we propose a transfer
learning algorithm that leverages the ability of graph neural networks (GNNs)
to represent chemical environments together with kernel mean embeddings. We
extract a feature map from GNNs pre-trained on the OC20 dataset and use it to
learn the potential energy surface from system-specific datasets of catalytic
processes. Our method is further enhanced by incorporating into the kernel the
chemical species information, resulting in improved performance and
interpretability. We test our approach on a series of realistic datasets of
increasing complexity, showing excellent generalization and transferability
performance, and improving on methods that rely on GNNs or ridge regression
alone, as well as similar fine-tuning approaches.
( 2
min )
Biomedical literature is growing rapidly, making it challenging to curate and
extract knowledge manually. Biomedical natural language processing (BioNLP)
techniques that can automatically extract information from biomedical
literature help alleviate this burden. Recently, large Language Models (LLMs),
such as GPT-3 and GPT-4, have gained significant attention for their impressive
performance. However, their effectiveness in BioNLP tasks and impact on method
development and downstream users remain understudied. This pilot study (1)
establishes the baseline performance of GPT-3 and GPT-4 at both zero-shot and
one-shot settings in eight BioNLP datasets across four applications: named
entity recognition, relation extraction, multi-label document classification,
and semantic similarity and reasoning, (2) examines the errors produced by the
LLMs and categorized the errors into three types: missingness, inconsistencies,
and unwanted artificial content, and (3) provides suggestions for using LLMs in
BioNLP applications. We make the datasets, baselines, and results publicly
available to the community via
https://github.com/qingyu-qc/gpt_bionlp_benchmark.
( 2
min )
Using a vocabulary that is shared across languages is common practice in
Multilingual Neural Machine Translation (MNMT). In addition to its simple
design, shared tokens play an important role in positive knowledge transfer,
assuming that shared tokens refer to similar meanings across languages.
However, when word overlap is small, especially due to different writing
systems, transfer is inhibited. In this paper, we define word-level information
transfer pathways via word equivalence classes and rely on graph networks to
fuse word embeddings across languages. Our experiments demonstrate the
advantages of our approach: 1) embeddings of words with similar meanings are
better aligned across languages, 2) our method achieves consistent BLEU
improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less
than 1.0\% additional trainable parameters are required with a limited increase
in computational costs, while inference time remains identical to the baseline.
We release the codebase to the community.
( 2
min )
Current methods to identify and classify racist language in text rely on
small-n qualitative approaches or large-n approaches focusing exclusively on
overt forms of racist discourse. This article provides a step-by-step
generalizable guideline to identify and classify different forms of racist
discourse in large corpora. In our approach, we start by conceptualizing racism
and its different manifestations. We then contextualize these racist
manifestations to the time and place of interest, which allows researchers to
identify their discursive form. Finally, we apply XLM-RoBERTa (XLM-R), a
cross-lingual model for supervised text classification with a cutting-edge
contextual understanding of text. We show that XLM-R and XLM-R-Racismo, our
pretrained model, outperform other state-of-the-art approaches in classifying
racism in large corpora. We illustrate our approach using a corpus of tweets
relating to the Ecuadorian ind\'igena community between 2018 and 2021.
( 2
min )
Predicting crowded intents and trajectories is crucial in varouls real-world
applications, including service robots and autonomous vehicles. Understanding
environmental dynamics is challenging, not only due to the complexities of
modeling pair-wise spatial and temporal interactions but also the diverse
influence of group-wise interactions. To decode the comprehensive pair-wise and
group-wise interactions in crowded scenarios, we introduce Hyper-STTN, a
Hypergraph-based Spatial-Temporal Transformer Network for crowd trajectory
prediction. In Hyper-STTN, crowded group-wise correlations are constructed
using a set of multi-scale hypergraphs with varying group sizes, captured
through random-walk robability-based hypergraph spectral convolution.
Additionally, a spatial-temporal transformer is adapted to capture pedestrians'
pair-wise latent interactions in spatial-temporal dimensions. These
heterogeneous group-wise and pair-wise are then fused and aligned though a
multimodal transformer network. Hyper-STTN outperformes other state-of-the-art
baselines and ablation models on 5 real-world pedestrian motion datasets.
( 2
min )
Deep neural networks (DNNs) could be deceived by generating
human-imperceptible perturbations of clean samples. Therefore, enhancing the
robustness of DNNs against adversarial attacks is a crucial task. In this
paper, we aim to train robust DNNs by limiting the set of outputs reachable via
a norm-bounded perturbation added to a clean sample. We refer to this set as
adversarial polytope, and each clean sample has a respective adversarial
polytope. Indeed, if the respective polytopes for all the samples are compact
such that they do not intersect the decision boundaries of the DNN, then the
DNN is robust against adversarial samples. Hence, the inner-working of our
algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial
\textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we
demonstrate the effectiveness of CAP over existing adversarial robustness
methods in improving the robustness of models against state-of-the-art attacks
including AutoAttack.
( 2
min )
Strategies for partially observable Markov decision processes (POMDP)
typically require memory. One way to represent this memory is via automata. We
present a method to learn an automaton representation of a strategy using a
modification of the L*-algorithm. Compared to the tabular representation of a
strategy, the resulting automaton is dramatically smaller and thus also more
explainable. Moreover, in the learning process, our heuristics may even improve
the strategy's performance. In contrast to approaches that synthesize an
automaton directly from the POMDP thereby solving it, our approach is
incomparably more scalable.
( 2
min )
Stochastic differential equations (SDEs) have been widely used to model real
world random phenomena. Existing works mainly focus on the case where the time
series is modeled by a single SDE, which might be restrictive for modeling time
series with distributional shift. In this work, we propose a change point
detection algorithm for time series modeled as neural SDEs. Given a time series
dataset, the proposed method jointly learns the unknown change points and the
parameters of distinct neural SDE models corresponding to each change point.
Specifically, the SDEs are learned under the framework of generative
adversarial networks (GANs) and the change points are detected based on the
output of the GAN discriminator in a forward pass. At each step of the proposed
algorithm, the change points and the SDE model parameters are updated in an
alternating fashion. Numerical results on both synthetic and real datasets are
provided to validate the performance of our algorithm in comparison to
classical change point detection benchmarks, standard GAN-based neural SDEs,
and other state-of-the-art deep generative models for time series data.
( 2
min )
Time-series anomaly detection deals with the problem of detecting anomalous
timesteps by learning normality from the sequence of observations. However, the
concept of normality evolves over time, leading to a "new normal problem",
where the distribution of normality can be changed due to the distribution
shifts between training and test data. This paper highlights the prevalence of
the new normal problem in unsupervised time-series anomaly detection studies.
To tackle this issue, we propose a simple yet effective test-time adaptation
strategy based on trend estimation and a self-supervised approach to learning
new normalities during inference. Extensive experiments on real-world
benchmarks demonstrate that incorporating the proposed strategy into the
anomaly detector consistently improves the model's performance compared to the
baselines, leading to robustness to the distribution shifts.
( 2
min )
This paper introduces a novel approach for topic modeling utilizing latent
codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely
encapsulating the rich information of the pre-trained embeddings such as the
pre-trained language model. From the novel interpretation of the latent
codebooks and embeddings as conceptual bag-of-words, we propose a new
generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates
the original documents related to the respective latent codebook. The TVQ-VAE
can visualize the topics with various generative distributions including the
traditional BoW distribution and the autoregressive image generation. Our
experimental results on document analysis and image generation demonstrate that
TVQ-VAE effectively captures the topic context which reveals the underlying
structures of the dataset and supports flexible forms of document generation.
Official implementation of the proposed TVQ-VAE is available at
https://github.com/clovaai/TVQ-VAE.
( 2
min )
Computational offloading has become an enabling component for edge
intelligence in mobile and smart devices. Existing offloading schemes mainly
focus on mobile devices and servers, while ignoring the potential network
congestion caused by tasks from multiple mobile devices, especially in wireless
multi-hop networks. To fill this gap, we propose a low-overhead,
congestion-aware distributed task offloading scheme by augmenting a distributed
greedy framework with graph-based machine learning. In simulated wireless
multi-hop networks with 20-110 nodes and a resource allocation scheme based on
shortest path routing and contention-based link scheduling, our approach is
demonstrated to be effective in reducing congestion or unstable queues under
the context-agnostic baseline, while improving the execution latency over local
computing.
( 2
min )
This paper revisits a class of convex Finite-Sum Coupled Compositional
Stochastic Optimization (cFCCO) problems with many applications, including
group distributionally robust optimization (GDRO), learning with imbalanced
data, reinforcement learning, and learning to rank. To better solve these
problems, we introduce an efficient single-loop primal-dual block-coordinate
proximal algorithm, dubbed ALEXR. This algorithm leverages block-coordinate
stochastic mirror ascent updates for the dual variable and stochastic proximal
gradient descent updates for the primal variable. We establish the convergence
rates of ALEXR in both convex and strongly convex cases under smoothness and
non-smoothness conditions of involved functions, which not only improve the
best rates in previous works on smooth cFCCO problems but also expand the realm
of cFCCO for solving more challenging non-smooth problems such as the dual form
of GDRO. Finally, we present lower complexity bounds to demonstrate that the
convergence rates of ALEXR are optimal among first-order block-coordinate
stochastic algorithms for the considered class of cFCCO problems.
( 2
min )
Spiking Neural Networks (SNNs) have gained considerable attention due to the
energy-efficient and multiplication-free characteristics. The continuous growth
in scale of deep SNNs poses challenges for model deployment. Network pruning
reduces hardware resource requirements of model deployment by compressing the
network scale. However, existing SNN pruning methods cause high pruning costs
and performance loss because the pruning iterations amplify the training
difficulty of SNNs. In this paper, inspired by the critical brain hypothesis in
neuroscience, we propose a regeneration mechanism based on the neuron
criticality for SNN pruning to enhance feature extraction and accelerate the
pruning process. Firstly, we propose a low-cost metric for the criticality in
SNNs. Then, we re-rank the pruned structures after pruning and regenerate those
with higher criticality to obtain the critical network. Our method achieves
higher performance than the current state-of-the-art (SOTA) method with up to
95.26% reduction of pruning cost. Moreover, we investigate the underlying
mechanism of our method and find that it efficiently selects potential
structures and learns the consistent feature representation.
( 2
min )
Using the atomic cluster expansion (ACE) framework, we develop a machine
learning interatomic potential for fast and accurately modelling the phonon
transport properties of wurtzite aluminum nitride. The predictive power of the
ACE potential against density functional theory (DFT) is demonstrated across a
broad range of properties of w-AlN, including ground-state lattice parameters,
specific heat capacity, coefficients of thermal expansion, bulk modulus, and
harmonic phonon dispersions. Validation of lattice thermal conductivity is
further carried out by comparing the ACE-predicted values to the DFT
calculations and experiments, exhibiting the overall capability of our ACE
potential in sufficiently describing anharmonic phonon interactions. As a
practical application, we perform a lattice dynamics analysis using the
potential to unravel the effects of biaxial strains on thermal conductivity and
phonon properties of w-AlN, which is identified as a significant tuning factor
for near-junction thermal design of w-AlN-based electronics.
( 2
min )
We sample from a given target distribution by constructing a neural network
which maps samples from a simple reference, e.g. the standard normal
distribution, to samples from the target. To that end, we propose using a
neural network architecture inspired by the Langevin Monte Carlo (LMC)
algorithm. Based on LMC perturbation results, we show approximation rates of
the proposed architecture for smooth, log-concave target distributions measured
in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of
sub-Gaussianity of the intermediate measures of the perturbed LMC process. In
particular, we derive bounds on the growth of the intermediate variance proxies
under different assumptions on the perturbations. Moreover, we propose an
architecture similar to deep residual neural networks and derive expressivity
results for approximating the sample to target distribution map.
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
In this paper, we present conditions for identifying the generator of a
linear stochastic differential equation (SDE) from the distribution of its
solution process with a given fixed initial state. These identifiability
conditions are crucial in causal inference using linear SDEs as they enable the
identification of the post-intervention distributions from its observational
distribution. Specifically, we derive a sufficient and necessary condition for
identifying the generator of linear SDEs with additive noise, as well as a
sufficient condition for identifying the generator of linear SDEs with
multiplicative noise. We show that the conditions derived for both types of
SDEs are generic. Moreover, we offer geometric interpretations of the derived
identifiability conditions to enhance their understanding. To validate our
theoretical results, we perform a series of simulations, which support and
substantiate the established findings.
( 2
min )
We establish finite-sample guarantees for efficient proper learning of
bounded-degree polytrees, a rich class of high-dimensional probability
distributions and a subclass of Bayesian networks, a widely-studied type of
graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample
guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees.
We extend their results by providing an efficient algorithm which learns
$d$-polytrees in polynomial time and sample complexity for any bounded $d$ when
the underlying undirected graph (skeleton) is known. We complement our
algorithm with an information-theoretic sample complexity lower bound, showing
that the dependence on the dimension and target accuracy parameters are nearly
tight.
( 2
min )
This study preprocessed 2000-2019 energy consumption data for 46 key Sichuan
industries using matrix normalization. DBSCAN clustering identified 16 feature
classes to objectively group industries. Penalized regression models were then
applied for their advantages in overfitting control, high-dimensional data
processing, and feature selection - well-suited for the complex energy data.
Results showed the second cluster around coal had highest emissions due to
production needs. Emissions from gasoline-focused and coke-focused clusters
were also significant. Based on this, emission reduction suggestions included
clean coal technologies, transportation management, coal-electricity
replacement in steel, and industry standardization. The research introduced
unsupervised learning to objectively select factors and aimed to explore new
emission reduction avenues. In summary, the study identified industry
groupings, assessed emissions drivers, and proposed scientific reduction
strategies to better inform decision-making using algorithms like DBSCAN and
penalized regression models.
( 2
min )
Large language models (LLMs) have significantly improved the ability to
perform tasks in the field of code generation. However, there is still a gap
between LLMs being capable coders and being top-tier software engineers. Based
on the observation that toplevel software engineers often ask clarifying
questions to reduce ambiguity in both requirements and coding solutions, I
argue that the same should be applied to LLMs for code generation tasks. By
asking probing questions in various topics before generating the final code,
the challenges of programming with LLMs, such as unclear intent specification,
lack of computational thinking, and undesired code quality, may be alleviated.
This, in turn, increases confidence in the generated code. In this work, I
explore how to leverage better communication skills to achieve greater
confidence in generated code. I propose a communication-centered process that
uses an LLM-generated communicator to identify issues with high ambiguity or
low confidence in problem descriptions and generated code. I then ask
clarifying questions to obtain responses from users for refining the code.
( 3
min )
We study online multiclass classification under bandit feedback. We extend
the results of Daniely and Helbertal [2013] by showing that the finiteness of
the Bandit Littlestone dimension is necessary and sufficient for bandit online
learnability even when the label space is unbounded. Moreover, we show that,
unlike the full-information setting, sequential uniform convergence is
necessary but not sufficient for bandit online learnability. Our result
complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023]
who show that the Littlestone dimension characterizes online multiclass
learnability in the full-information setting even when the label space is
unbounded.
( 2
min )
This paper presents a deep reinforcement learning solution for optimizing
multi-UAV cell-association decisions and their moving velocity on a 3D aerial
highway. The objective is to enhance transportation and communication
performance, including collision avoidance, connectivity, and handovers. The
problem is formulated as a Markov decision process (MDP) with UAVs' states
defined by velocities and communication data rates. We propose a neural
architecture with a shared decision module and multiple network branches, each
dedicated to a specific action dimension in a 2D transportation-communication
space. This design efficiently handles the multi-dimensional action space,
allowing independence for individual action dimensions. We introduce two
models, Branching Dueling Q-Network (BDQ) and Branching Dueling Double Deep
Q-Network (Dueling DDQN), to demonstrate the approach. Simulation results show
a significant improvement of 18.32% compared to existing benchmarks.
( 2
min )
Deep Neural Networks (DNNs) have emerged as an effective approach to tackling
real-world problems. However, like human-written software, DNNs can have bugs
and can be attacked. To address this, research has explored a wide-range of
algorithmic approaches to verify DNN behavior. In this work, we introduce
NeuralSAT, a new verification approach that adapts the widely-used DPLL(T)
algorithm used in modern SMT solvers. A key feature of SMT solvers is the use
of conflict clause learning and search restart to scale verification. Unlike
prior DNN verification approaches, NeuralSAT combines an abstraction-based
deductive theory solver with clause learning and an evaluation clearly
demonstrates the benefits of the approach on a set of challenging verification
benchmarks.
( 2
min )
Bilevel programming has emerged as a valuable tool for hyperparameter
selection, a central concern in machine learning. In a recent study by Ye et
al. (2023), a value function-based difference of convex algorithm was
introduced to address bilevel programs. This approach proves particularly
powerful when dealing with scenarios where the lower-level problem exhibits
convexity in both the upper-level and lower-level variables. Examples of such
scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized
regression. In this paper, we significantly expand the range of applications,
now requiring convexity only in the lower-level variables of the lower-level
program. We present an innovative single-level difference of weakly convex
reformulation based on the Moreau envelope of the lower-level problem. We
further develop a sequentially convergent Inexact Proximal Difference of Weakly
Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed
iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for
kernel support vector machines on simulated data.
( 2
min )
Federated learning (FL) has garnered considerable attention due to its
privacy-preserving feature. Nonetheless, the lack of freedom in managing user
data can lead to group fairness issues, where models are biased towards
sensitive factors such as race or gender. To tackle this issue, this paper
proposes a novel algorithm, fair federated averaging with augmented Lagrangian
method (FFALM), designed explicitly to address group fairness issues in FL.
Specifically, we impose a fairness constraint on the training objective and
solve the minimax reformulation of the constrained optimization problem. Then,
we derive the theoretical upper bound for the convergence rate of FFALM. The
effectiveness of FFALM in improving fairness is shown empirically on CelebA and
UTKFace datasets in the presence of severe statistical heterogeneity.
( 2
min )
The posterior collapse phenomenon in variational autoencoder (VAE), where the
variational posterior distribution closely matches the prior distribution, can
hinder the quality of the learned latent variables. As a consequence of
posterior collapse, the latent variables extracted by the encoder in VAE
preserve less information from the input data and thus fail to produce
meaningful representations as input to the reconstruction process in the
decoder. While this phenomenon has been an actively addressed topic related to
VAE performance, the theory for posterior collapse remains underdeveloped,
especially beyond the standard VAE. In this work, we advance the theoretical
understanding of posterior collapse to two important and prevalent yet less
studied classes of VAE: conditional VAE and hierarchical VAE. Specifically, via
a non-trivial theoretical analysis of linear conditional VAE and hierarchical
VAE with two levels of latent, we prove that the cause of posterior collapses
in these models includes the correlation between the input and output of the
conditional VAE and the effect of learnable encoder variance in the
hierarchical VAE. We empirically validate our theoretical findings for linear
conditional and hierarchical VAE and demonstrate that these results are also
predictive for non-linear cases with extensive experiments.
( 3
min )
This document presents a stock market analysis conducted on a dataset
consisting of 750 instances and 16 attributes donated in 2014-10-23. The
analysis includes an exploratory data analysis (EDA) section, feature
engineering, data preparation, model selection, and insights from the analysis.
The Fama French 3-factor model is also utilized in the analysis. The results of
the analysis are presented, with linear regression being the best-performing
model.
( 2
min )
In the era of information proliferation, discerning the credibility of news
content poses an ever-growing challenge. This paper introduces RELIANCE, a
pioneering ensemble learning system designed for robust information and fake
news credibility evaluation. Comprising five diverse base models, including
Support Vector Machine (SVM), naive Bayes, logistic regression, random forest,
and Bidirectional Long Short Term Memory Networks (BiLSTMs), RELIANCE employs
an innovative approach to integrate their strengths, harnessing the collective
intelligence of the ensemble for enhanced accuracy. Experiments demonstrate the
superiority of RELIANCE over individual models, indicating its efficacy in
distinguishing between credible and non-credible information sources. RELIANCE,
also surpasses baseline models in information and news credibility assessment,
establishing itself as an effective solution for evaluating the reliability of
information sources.
( 2
min )
With the fast development of Deep Learning techniques, Named Entity
Recognition (NER) is becoming more and more important in the information
extraction task. The greatest difficulty that the NER task faces is to keep the
detectability even when types of NE and documents are unfamiliar. Realizing
that the specificity information may contain potential meanings of a word and
generate semantic-related features for word embedding, we develop a
distribution-aware word embedding and implement three different methods to make
use of the distribution information in a NER framework. And the result shows
that the performance of NER will be improved if the word specificity is
incorporated into existing NER methods.
( 2
min )
The question of what makes a data distribution suitable for deep learning is
a fundamental open problem. Focusing on locally connected neural networks (a
prevalent family of architectures that includes convolutional and recurrent
neural networks as well as local self-attention models), we address this
problem by adopting theoretical tools from quantum physics. Our main
theoretical result states that a certain locally connected neural network is
capable of accurate prediction over a data distribution if and only if the data
distribution admits low quantum entanglement under certain canonical partitions
of features. As a practical application of this result, we derive a
preprocessing method for enhancing the suitability of a data distribution to
locally connected neural networks. Experiments with widespread models over
various datasets demonstrate our findings. We hope that our use of quantum
entanglement will encourage further adoption of tools from physics for formally
reasoning about the relation between deep learning and real-world data.
( 3
min )
Sutton, Szepesv\'{a}ri and Maei introduced the first gradient
temporal-difference (GTD) learning algorithms compatible with both linear
function approximation and off-policy training. The goal of this paper is (a)
to propose some variants of GTDs with extensive comparative analysis and (b) to
establish new theoretical analysis frameworks for the GTDs. These variants are
based on convex-concave saddle-point interpretations of GTDs, which effectively
unify all the GTDs into a single framework, and provide simple stability
analysis based on recent results on primal-dual gradient dynamics. Finally,
numerical comparative analysis is given to evaluate these approaches.
( 2
min )
Continual learning aims to train a model incrementally on a sequence of tasks
without forgetting previous knowledge. Although continual learning has been
widely studied in computer vision, its application to Vision+Language tasks is
not that straightforward, as settings can be parameterized in multiple ways
according to their input modalities. In this paper, we present a detailed study
of how different settings affect performance for Visual Question Answering. We
first propose three plausible task formulations and demonstrate their impact on
the performance of continual learning algorithms. We break down several factors
of task similarity, showing that performance and sensitivity to task order
highly depend on the shift of the output distribution. We also investigate the
potential of pretrained models and compare the robustness of transformer models
with different visual embeddings. Finally, we provide an analysis interpreting
model representations and their impact on forgetting. Our results highlight the
importance of stabilizing visual representations in deeper layers.
( 2
min )
A code generation model generates code by taking a prompt from a code
comment, existing code, or a combination of both. Although code generation
models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is
unclear whether they can successfully be used for unit test generation without
fine-tuning for a strongly typed language like Java. To fill this gap, we
investigated how well three models (Codex, GPT-3.5-Turbo, and StarCoder) can
generate unit tests. We used two benchmarks (HumanEval and Evosuite SF110) to
investigate the effect of context generation on the unit test generation
process. We evaluated the models based on compilation rates, test correctness,
test coverage, and test smells. We found that the Codex model achieved above
80% coverage for the HumanEval dataset, but no model had more than 2% coverage
for the EvoSuite SF110 benchmark. The generated tests also suffered from test
smells, such as Duplicated Asserts and Empty Tests.
( 2
min )
There has been considerable recent interest in estimating heterogeneous
causal effects. In this paper, we introduce conditional average partial causal
effects (CAPCE) to reveal the heterogeneity of causal effects with continuous
treatment. We provide conditions for identifying CAPCE in an instrumental
variable setting. We develop three families of CAPCE estimators: sieve,
parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze
their statistical properties. We illustrate the proposed CAPCE estimators on
synthetic and real-world data.
( 2
min )
Identifying important features linked to a response variable is a fundamental
task in various scientific domains. This article explores statistical inference
for simulated Markov random fields in high-dimensional settings. We introduce a
methodology based on Markov Chain Monte Carlo Maximum Likelihood Estimation
(MCMC-MLE) with Elastic-net regularization. Under mild conditions on the MCMC
method, our penalized MCMC-MLE method achieves $\ell_{1}$-consistency. We
propose a decorrelated score test, establishing both its asymptotic normality
and that of a one-step estimator, along with the associated confidence
interval. Furthermore, we construct two false discovery rate control procedures
via the asymptotic behaviors for both p-values and e-values. Comprehensive
numerical simulations confirm the theoretical validity of the proposed methods.
( 2
min )
In modern radar systems, precise target localization using azimuth and
velocity estimation is paramount. Traditional unbiased estimation methods have
leveraged gradient descent algorithms to reach the theoretical limits of the
Cram\'er Rao Bound (CRB) for the error of the parameter estimates. In this
study, we present a data-driven neural network approach that outperforms these
traditional techniques, demonstrating improved accuracies in target azimuth and
velocity estimation. Using a representative simulated scenario, we show that
our proposed neural network model consistently achieves improved parameter
estimates due to its inherently biased nature, yielding a diminished mean
squared error (MSE). Our findings underscore the potential of employing deep
learning methods in radar systems, paving the way for more accurate
localization in cluttered and dynamic environments.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that estimates probability distributions using
functional analytic approach: first, it finds a smooth functional estimate of
the probability distribution, whether it is normalized or not; second, the
algorithm provides an estimate of the normalizing weight; and third, the
algorithm proposes a new computation scheme to compute such estimates.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. Computations can be parallelized and completed
in one pass.
OPAA can be applied broadly to the estimation of probability density
functions. In Bayesian problems, it can be applied to estimating the
normalizing weight of the posterior, which is also known as the evidence,
serving as an alternative to existing optimization-based methods.
( 2
min )
In non-asymptotic learning, variance-type parameters of sub-Gaussian
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.
( 2
min )
The manifold scattering transform is a deep feature extractor for data
defined on a Riemannian manifold. It is one of the first examples of extending
convolutional neural network-like operators to general manifolds. The initial
work on this model focused primarily on its theoretical stability and
invariance properties but did not provide methods for its numerical
implementation except in the case of two-dimensional surfaces with predefined
meshes. In this work, we present practical schemes, based on the theory of
diffusion maps, for implementing the manifold scattering transform to datasets
arising in naturalistic systems, such as single cell genetics, where the data
is a high-dimensional point cloud modeled as lying on a low-dimensional
manifold. We show that our methods are effective for signal classification and
manifold classification tasks.
( 2
min )
In recent times machine learning methods have made significant advances in
becoming a useful tool for analyzing physical systems. A particularly active
area in this theme has been "physics-informed machine learning" which focuses
on using neural nets for numerically solving differential equations. In this
work, we aim to advance the theory of measuring out-of-sample error while
training DeepONets -- which is among the most versatile ways to solve PDE
systems in one-shot.
Firstly, for a class of DeepONets, we prove a bound on their Rademacher
complexity which does not explicitly scale with the width of the nets involved.
Secondly, we use this to show how the Huber loss can be chosen so that for
these DeepONet classes generalization error bounds can be obtained that have no
explicit dependence on the size of the nets. We note that our theoretical
results apply to any PDE being targeted to be solved by DeepONets.
( 2
min )
Unsupervised video object learning seeks to decompose video scenes into
structural object representations without any supervision from depth, optical
flow, or segmentation. We present VONet, an innovative approach that is
inspired by MONet. While utilizing a U-Net architecture, VONet employs an
efficient and effective parallel attention inference process, generating
attention masks for all slots simultaneously. Additionally, to enhance the
temporal consistency of each mask across consecutive video frames, VONet
develops an object-wise sequential VAE framework. The integration of these
innovative encoder-side techniques, in conjunction with an expressive
transformer-based decoder, establishes VONet as the leading unsupervised method
for object learning across five MOVI datasets, encompassing videos of diverse
complexities. Code is available at https://github.com/hnyu/vonet.
( 2
min )
Document set expansion aims to identify relevant documents from a large
collection based on a small set of documents that are on a fine-grained topic.
Previous work shows that PU learning is a promising method for this task.
However, some serious issues remain unresolved, i.e. typical challenges that PU
methods suffer such as unknown class prior and imbalanced data, and the need
for transductive experimental settings. In this paper, we propose a novel PU
learning framework based on density estimation, called puDE, that can handle
the above issues. The advantage of puDE is that it neither constrained to the
SCAR assumption and nor require any class prior knowledge. We demonstrate the
effectiveness of the proposed method using a series of real-world datasets and
conclude that our method is a better alternative for the DSE task.
( 2
min )
Constructing first principles models is a challenging task for nonlinear and
complex systems such as a wastewater treatment unit. In recent years,
data-driven models are widely used to overcome the complexity. However, they
often suffer from issues such as missing, low quality or noisy data. Transfer
learning is a solution for this issue where knowledge from another task is
transferred to target one to increase the prediction performance. In this work,
the objective is increasing the prediction performance of an industrial
wastewater treatment plant by transferring the knowledge of (i) an open-source
simulation model that captures the underlying physics of the process, albeit
with dissimilarities to the target plant, (ii) another industrial plant
characterized by noisy and limited data but located in the same refinery, and
(iii) the model in (ii) and making the objective function of the training
problem physics informed where the physics information derived from the
open-source model in (ii). The results have shown that test and validation
performance are improved up to 27% and 59%, respectively.
( 2
min )
Ensemble defenses, are widely employed in various security-related
applications to enhance model performance and robustness. The widespread
adoption of these techniques also raises many questions: Are general ensembles
defenses guaranteed to be more robust than individuals? Will stronger adaptive
attacks defeat existing ensemble defense strategies as the cybersecurity arms
race progresses? Can ensemble defenses achieve adversarial robustness to
different types of attacks simultaneously and resist the continually adjusted
adaptive attacks? Unfortunately, these critical questions remain unresolved as
there are no platforms for comprehensive evaluation of ensemble adversarial
attacks and defenses in the cybersecurity domain. In this paper, we propose a
general Cybersecurity Adversarial Robustness Evaluation (CARE) platform aiming
to bridge this gap.
( 2
min )
Variational families with full-rank covariance approximations are known not
to work well in black-box variational inference (BBVI), both empirically and
theoretically. In fact, recent computational complexity results for BBVI have
established that full-rank variational families scale poorly with the
dimensionality of the problem compared to e.g. mean field families. This is
particularly critical to hierarchical Bayesian models with local variables;
their dimensionality increases with the size of the datasets. Consequently, one
gets an iteration complexity with an explicit $\mathcal{O}(N^2)$ dependence on
the dataset size $N$. In this paper, we explore a theoretical middle ground
between mean-field variational families and full-rank families: structured
variational families. We rigorously prove that certain scale matrix structures
can achieve a better iteration complexity of $\mathcal{O}(N)$, implying better
scaling with respect to $N$. We empirically verify our theoretical results on
large-scale hierarchical models.
( 2
min )
The applicability of widely adopted machine learning (ML) methods to
classification is circumscribed by the imperatives of explicability and
uncertainty, particularly evident in domains such as healthcare, behavioural
sciences, and finances, wherein accountability assumes priority. Recently,
Small and Incomplete Dataset Analyser (SaNDA) has been proposed to enhance the
ability to perform classification in such domains, by developing a data
abstraction protocol using a ROC curve-based method. This paper focuses on
column-wise data transformations called abstractions, which are crucial for
SaNDA's classification process and explores alternative abstractions protocols,
such as constant binning and quantiles. The best-performing methods have been
compared against Random Forest as a baseline for explainable methods. The
results suggests that SaNDA can be a viable substitute for Random Forest when
data is incomplete, even with minimal missing values. It consistently maintains
high accuracy even when half of the dataset is missing, unlike Random Forest
which experiences a significant decline in accuracy under similar conditions.
( 2
min )
Recently proposed methods for implicitly representing signals such as images,
scenes, or geometries using coordinate-based neural network architectures often
do not leverage the choice of activation functions, or do so only to a limited
extent. In this paper, we introduce the Hyperbolic Oscillation function (HOSC),
a novel activation function with a controllable sharpness parameter. Unlike any
previous activations, HOSC has been specifically designed to better capture
sudden changes in the input signal, and hence sharp or acute features of the
underlying data, as well as smooth low-frequency transitions. Due to its
simplicity and modularity, HOSC offers a plug-and-play functionality that can
be easily incorporated into any existing method employing a neural network as a
way of implicitly representing a signal. We benchmark HOSC against other
popular activations in an array of general tasks, empirically showing an
improvement in the quality of obtained representations, provide the
mathematical motivation behind the efficacy of HOSC, and discuss its
limitations.
( 2
min )
Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling
complex relationships in graph-structured data. A recent innovation in this
field is the family of Differential Equation-Inspired Graph Neural Networks
(DE-GNNs), which leverage principles from continuous dynamical systems to model
information flow on graphs with built-in properties such as feature smoothing
or preservation. However, existing DE-GNNs rely on first or second-order
temporal dependencies. In this paper, we propose a neural extension to those
pre-defined temporal dependencies. We show that our model, called TDE-GNN, can
capture a wide range of temporal dynamics that go beyond typical first or
second-order methods, and provide use cases where existing temporal models are
challenged. We demonstrate the benefit of learning the temporal dependencies
using our method rather than using pre-defined temporal dynamics on several
graph benchmarks.
( 2
min )
In order to efficiently explore the chemical space of all possible small
molecules, a common approach is to compress the dimension of the system to
facilitate downstream machine learning tasks. Towards this end, we present a
data driven approach for clustering potential energy landscapes of molecular
structures by applying recently developed Network Embedding techniques, to
obtain latent variables defined through the embedding function. To scale up the
method, we also incorporate an entropy sensitive adaptive scheme for
hierarchical sampling of the energy landscape, based on Metadynamics and
Transition Path Theory. By taking into account the kinetic information implied
by a system's energy landscape, we are able to interpret dynamical node-node
relationships in reduced dimensions. We demonstrate the framework through
Lennard-Jones (LJ) clusters and a human DNA sequence.
( 2
min )
Speaker embeddings carry valuable emotion-related information, which makes
them a promising resource for enhancing speech emotion recognition (SER),
especially with limited labeled data. Traditionally, it has been assumed that
emotion information is indirectly embedded within speaker embeddings, leading
to their under-utilization. Our study reveals a direct and useful link between
emotion and state-of-the-art speaker embeddings in the form of intra-speaker
clusters. By conducting a thorough clustering analysis, we demonstrate that
emotion information can be readily extracted from speaker embeddings. In order
to leverage this information, we introduce a novel contrastive pretraining
approach applied to emotion-unlabeled data for speech emotion recognition. The
proposed approach involves the sampling of positive and the negative examples
based on the intra-speaker clusters of speaker embeddings. The proposed
strategy, which leverages extensive emotion-unlabeled data, leads to a
significant improvement in SER performance, whether employed as a standalone
pretraining task or integrated into a multi-task pretraining setting.
( 2
min )
We sample from a given target distribution by constructing a neural network
which maps samples from a simple reference, e.g. the standard normal
distribution, to samples from the target. To that end, we propose using a
neural network architecture inspired by the Langevin Monte Carlo (LMC)
algorithm. Based on LMC perturbation results, we show approximation rates of
the proposed architecture for smooth, log-concave target distributions measured
in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of
sub-Gaussianity of the intermediate measures of the perturbed LMC process. In
particular, we derive bounds on the growth of the intermediate variance proxies
under different assumptions on the perturbations. Moreover, we propose an
architecture similar to deep residual neural networks and derive expressivity
results for approximating the sample to target distribution map.
( 2
min )
We establish finite-sample guarantees for efficient proper learning of
bounded-degree polytrees, a rich class of high-dimensional probability
distributions and a subclass of Bayesian networks, a widely-studied type of
graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample
guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees.
We extend their results by providing an efficient algorithm which learns
$d$-polytrees in polynomial time and sample complexity for any bounded $d$ when
the underlying undirected graph (skeleton) is known. We complement our
algorithm with an information-theoretic sample complexity lower bound, showing
that the dependence on the dimension and target accuracy parameters are nearly
tight.
( 2
min )
Stochastic differential equations (SDEs) have been widely used to model real
world random phenomena. Existing works mainly focus on the case where the time
series is modeled by a single SDE, which might be restrictive for modeling time
series with distributional shift. In this work, we propose a change point
detection algorithm for time series modeled as neural SDEs. Given a time series
dataset, the proposed method jointly learns the unknown change points and the
parameters of distinct neural SDE models corresponding to each change point.
Specifically, the SDEs are learned under the framework of generative
adversarial networks (GANs) and the change points are detected based on the
output of the GAN discriminator in a forward pass. At each step of the proposed
algorithm, the change points and the SDE model parameters are updated in an
alternating fashion. Numerical results on both synthetic and real datasets are
provided to validate the performance of our algorithm in comparison to
classical change point detection benchmarks, standard GAN-based neural SDEs,
and other state-of-the-art deep generative models for time series data.
( 2
min )
In this paper, we present conditions for identifying the generator of a
linear stochastic differential equation (SDE) from the distribution of its
solution process with a given fixed initial state. These identifiability
conditions are crucial in causal inference using linear SDEs as they enable the
identification of the post-intervention distributions from its observational
distribution. Specifically, we derive a sufficient and necessary condition for
identifying the generator of linear SDEs with additive noise, as well as a
sufficient condition for identifying the generator of linear SDEs with
multiplicative noise. We show that the conditions derived for both types of
SDEs are generic. Moreover, we offer geometric interpretations of the derived
identifiability conditions to enhance their understanding. To validate our
theoretical results, we perform a series of simulations, which support and
substantiate the established findings.
( 2
min )
In non-asymptotic learning, variance-type parameters of sub-Gaussian
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.
( 2
min )
The posterior collapse phenomenon in variational autoencoder (VAE), where the
variational posterior distribution closely matches the prior distribution, can
hinder the quality of the learned latent variables. As a consequence of
posterior collapse, the latent variables extracted by the encoder in VAE
preserve less information from the input data and thus fail to produce
meaningful representations as input to the reconstruction process in the
decoder. While this phenomenon has been an actively addressed topic related to
VAE performance, the theory for posterior collapse remains underdeveloped,
especially beyond the standard VAE. In this work, we advance the theoretical
understanding of posterior collapse to two important and prevalent yet less
studied classes of VAE: conditional VAE and hierarchical VAE. Specifically, via
a non-trivial theoretical analysis of linear conditional VAE and hierarchical
VAE with two levels of latent, we prove that the cause of posterior collapses
in these models includes the correlation between the input and output of the
conditional VAE and the effect of learnable encoder variance in the
hierarchical VAE. We empirically validate our theoretical findings for linear
conditional and hierarchical VAE and demonstrate that these results are also
predictive for non-linear cases with extensive experiments.
( 3
min )
We propose a method for estimation and inference for bounds for heterogeneous
causal effect parameters in general sample selection models where the treatment
can affect whether an outcome is observed and no exclusion restrictions are
available. The method provides conditional effect bounds as functions of policy
relevant pre-treatment variables. It allows for conducting valid statistical
inference on the unidentified conditional effects. We use a flexible
debiased/double machine learning approach that can accommodate non-linear
functional forms and high-dimensional confounders. Easily verifiable high-level
conditions for estimation, misspecification robust confidence intervals, and
uniform confidence bands are provided as well. We re-analyze data from a large
scale field experiment on Facebook on counter-attitudinal news subscription
with attrition. Our method yields substantially tighter effect bounds compared
to conventional methods and suggests depolarization effects for younger users.
( 2
min )
We study online multiclass classification under bandit feedback. We extend
the results of Daniely and Helbertal [2013] by showing that the finiteness of
the Bandit Littlestone dimension is necessary and sufficient for bandit online
learnability even when the label space is unbounded. Moreover, we show that,
unlike the full-information setting, sequential uniform convergence is
necessary but not sufficient for bandit online learnability. Our result
complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023]
who show that the Littlestone dimension characterizes online multiclass
learnability in the full-information setting even when the label space is
unbounded.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that estimates probability distributions using
functional analytic approach: first, it finds a smooth functional estimate of
the probability distribution, whether it is normalized or not; second, the
algorithm provides an estimate of the normalizing weight; and third, the
algorithm proposes a new computation scheme to compute such estimates.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. Computations can be parallelized and completed
in one pass.
OPAA can be applied broadly to the estimation of probability density
functions. In Bayesian problems, it can be applied to estimating the
normalizing weight of the posterior, which is also known as the evidence,
serving as an alternative to existing optimization-based methods.
( 2
min )
In recent times machine learning methods have made significant advances in
becoming a useful tool for analyzing physical systems. A particularly active
area in this theme has been "physics-informed machine learning" which focuses
on using neural nets for numerically solving differential equations. In this
work, we aim to advance the theory of measuring out-of-sample error while
training DeepONets -- which is among the most versatile ways to solve PDE
systems in one-shot.
Firstly, for a class of DeepONets, we prove a bound on their Rademacher
complexity which does not explicitly scale with the width of the nets involved.
Secondly, we use this to show how the Huber loss can be chosen so that for
these DeepONet classes generalization error bounds can be obtained that have no
explicit dependence on the size of the nets. We note that our theoretical
results apply to any PDE being targeted to be solved by DeepONets.
( 2
min )
Identifying important features linked to a response variable is a fundamental
task in various scientific domains. This article explores statistical inference
for simulated Markov random fields in high-dimensional settings. We introduce a
methodology based on Markov Chain Monte Carlo Maximum Likelihood Estimation
(MCMC-MLE) with Elastic-net regularization. Under mild conditions on the MCMC
method, our penalized MCMC-MLE method achieves $\ell_{1}$-consistency. We
propose a decorrelated score test, establishing both its asymptotic normality
and that of a one-step estimator, along with the associated confidence
interval. Furthermore, we construct two false discovery rate control procedures
via the asymptotic behaviors for both p-values and e-values. Comprehensive
numerical simulations confirm the theoretical validity of the proposed methods.
( 2
min )
The manifold scattering transform is a deep feature extractor for data
defined on a Riemannian manifold. It is one of the first examples of extending
convolutional neural network-like operators to general manifolds. The initial
work on this model focused primarily on its theoretical stability and
invariance properties but did not provide methods for its numerical
implementation except in the case of two-dimensional surfaces with predefined
meshes. In this work, we present practical schemes, based on the theory of
diffusion maps, for implementing the manifold scattering transform to datasets
arising in naturalistic systems, such as single cell genetics, where the data
is a high-dimensional point cloud modeled as lying on a low-dimensional
manifold. We show that our methods are effective for signal classification and
manifold classification tasks.
( 2
min )
We explore a stochastic contextual linear bandit problem where the agent
observes a noisy, corrupted version of the true context through a noise channel
with an unknown noise parameter. Our objective is to design an action policy
that can approximate" that of an oracle, which has access to the reward model,
the channel parameter, and the predictive distribution of the true context from
the observed noisy context. In a Bayesian framework, we introduce a Thompson
sampling algorithm for Gaussian bandits with Gaussian context noise. Adopting
an information-theoretic analysis, we demonstrate the Bayesian regret of our
algorithm concerning the oracle's action policy. We also extend this problem to
a scenario where the agent observes the true context with some delay after
receiving the reward and show that delayed true contexts lead to lower Bayesian
regret. Finally, we empirically demonstrate the performance of the proposed
algorithms against baselines.
( 2
min )
Approximate Thompson sampling with Langevin Monte Carlo broadens its reach
from Gaussian posterior sampling to encompass more general smooth posteriors.
However, it still encounters scalability issues in high-dimensional problems
when demanding high accuracy. To address this, we propose an approximate
Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where
the latter is the go-to workhorse for simulations of high-dimensional
posteriors. Based on the standard smoothness and log-concavity conditions, we
study the accelerated posterior concentration and sampling using a specific
potential function. This design improves the sample complexity for realizing
logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde
O}(\sqrt{d})$. The scalability and robustness of our algorithm are also
empirically validated through synthetic experiments in high-dimensional bandit
problems.
( 2
min )
There has been considerable recent interest in estimating heterogeneous
causal effects. In this paper, we introduce conditional average partial causal
effects (CAPCE) to reveal the heterogeneity of causal effects with continuous
treatment. We provide conditions for identifying CAPCE in an instrumental
variable setting. We develop three families of CAPCE estimators: sieve,
parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze
their statistical properties. We illustrate the proposed CAPCE estimators on
synthetic and real-world data.
( 2
min )
Methods for estimating heterogeneous treatment effects (HTE) from
observational data have largely focused on continuous or binary outcomes, with
less attention paid to survival outcomes and almost none to settings with
competing risks. In this work, we develop censoring unbiased transformations
(CUTs) for survival outcomes both with and without competing risks.After
converting time-to-event outcomes using these CUTs, direct application of HTE
learners for continuous outcomes yields consistent estimates of heterogeneous
cumulative incidence effects, total effects, and separable direct effects. Our
CUTs enable application of a much larger set of state of the art HTE learners
for censored outcomes than had previously been available, especially in
competing risks settings. We provide generic model-free learner-specific oracle
inequalities bounding the finite-sample excess risk. The oracle efficiency
results depend on the oracle selector and estimated nuisance functions from all
steps involved in the transformation. We demonstrate the empirical performance
of the proposed methods in simulation studies.
( 2
min )
Variational families with full-rank covariance approximations are known not
to work well in black-box variational inference (BBVI), both empirically and
theoretically. In fact, recent computational complexity results for BBVI have
established that full-rank variational families scale poorly with the
dimensionality of the problem compared to e.g. mean field families. This is
particularly critical to hierarchical Bayesian models with local variables;
their dimensionality increases with the size of the datasets. Consequently, one
gets an iteration complexity with an explicit $\mathcal{O}(N^2)$ dependence on
the dataset size $N$. In this paper, we explore a theoretical middle ground
between mean-field variational families and full-rank families: structured
variational families. We rigorously prove that certain scale matrix structures
can achieve a better iteration complexity of $\mathcal{O}(N)$, implying better
scaling with respect to $N$. We empirically verify our theoretical results on
large-scale hierarchical models.
( 2
min )
Quantum machine learning, which involves running machine learning algorithms
on quantum devices, has garnered significant attention in both academic and
business circles. In this paper, we offer a comprehensive and unbiased review
of the various concepts that have emerged in the field of quantum machine
learning. This includes techniques used in Noisy Intermediate-Scale Quantum
(NISQ) technologies and approaches for algorithms compatible with
fault-tolerant quantum computing hardware. Our review covers fundamental
concepts, algorithms, and the statistical learning theory pertinent to quantum
machine learning.
( 2
min )
This paper addresses second-order stochastic optimization for estimating the
minimizer of a convex function written as an expectation. A direct recursive
estimation technique for the inverse Hessian matrix using a Robbins-Monro
procedure is introduced. This approach enables to drastically reduces
computational complexity. Above all, it allows to develop universal stochastic
Newton methods and investigate the asymptotic efficiency of the proposed
approach. This work so expands the application scope of secondorder algorithms
in stochastic optimization.
( 2
min )
Machine learning can be overwhelming with its variety of tasks. Most tasks can be solved with a few ML algorithms. You need to be aware of which algorithms to select, when to apply them, what parameters to take into consideration, and how to test them. This guide was crafted to provide you with a straightforward… Read More »Choosing the right machine learning algorithm for business success
The post Choosing the right machine learning algorithm for business success appeared first on Data Science Central.
( 23
min )
The automotive industry is being transformed by the integration of cutting-edge technologies into software-defined cars. At CES, NVIDIA invited industry leaders to share their perspectives on how technology, especially AI and computing power, is shaping the future of transportation. Watch the video to learn more from NVIDIA’s auto partners. Redefining Possibilities Through Partnership Magnus Ostberg, Read article >
( 6
min )
It’s hard to imagine an industry more competitive — or fast-paced — than online retail. Sellers need to create attractive and informative product listings that must be engaging, capture attention and generate trust. Amazon uses optimized containers on Amazon Elastic Compute Cloud (Amazon EC2) with NVIDIA Tensor Core GPUs to power a generative AI tool Read article >
( 5
min )
MetaOpt helps analyze, explain, and improve heuristic performance before deployment in production systems. Learn how it works, particularly in traffic engineering, packet scheduling, and VM placement.
The post MetaOpt: Examining, explaining, and improving heuristic performance appeared first on Microsoft Research.
( 10
min )
Mode collapse is a significant unsolved issue of generative adversarial
networks. In this work, we examine the causes of mode collapse from a novel
perspective. Due to the nonuniform sampling in the training process, some
sub-distributions may be missed when sampling data. As a result, even when the
generated distribution differs from the real one, the GAN objective can still
achieve the minimum. To address the issue, we propose a global distribution
fitting (GDF) method with a penalty term to confine the generated data
distribution. When the generated distribution differs from the real one, GDF
will make the objective harder to reach the minimal value, while the original
global minimum is not changed. To deal with the circumstance when the overall
real data is unreachable, we also propose a local distribution fitting (LDF)
method. Experiments on several benchmarks demonstrate the effectiveness and
competitive performance of GDF and LDF.
( 2
min )
Machine-learned normalizing flows can be used in the context of lattice
quantum field theory to generate statistically correlated ensembles of lattice
gauge fields at different action parameters. This work demonstrates how these
correlations can be exploited for variance reduction in the computation of
observables. Three different proof-of-concept applications are demonstrated
using a novel residual flow architecture: continuum limits of gauge theories,
the mass dependence of QCD observables, and hadronic matrix elements based on
the Feynman-Hellmann approach. In all three cases, it is shown that statistical
uncertainties are significantly reduced when machine-learned flows are
incorporated as compared with the same calculations performed with uncorrelated
ensembles or direct reweighting.
( 2
min )
RNA, whose functionality is largely determined by its structure, plays an
important role in many biological activities. The prediction of pairwise
structural proximity between each nucleotide of an RNA sequence can
characterize the structural information of the RNA. Historically, this problem
has been tackled by machine learning models using expert-engineered features
and trained on scarce labeled datasets. Here, we find that the knowledge
learned by a protein-coevolution Transformer-based deep neural network can be
transferred to the RNA contact prediction task. As protein datasets are orders
of magnitude larger than those for RNA contact prediction, our findings and the
subsequent framework greatly reduce the data scarcity bottleneck. Experiments
confirm that RNA contact prediction through transfer learning using a publicly
available protein model is greatly improved. Our findings indicate that the
learned structural patterns of proteins can be transferred to RNAs, opening up
potential new avenues for research.
( 2
min )
Recent years have seen a surge of interest in the algorithmic estimation of
stochastic entropy production (EP) from trajectory data via machine learning. A
crucial element of such algorithms is the identification of a loss function
whose minimization guarantees the accurate EP estimation. In this study, we
show that there exists a host of loss functions, namely those implementing a
variational representation of the $\alpha$-divergence, which can be used for
the EP estimation. By fixing $\alpha$ to a value between $-1$ and $0$, the
$\alpha$-NEEP (Neural Estimator for Entropy Production) exhibits a much more
robust performance against strong nonequilibrium driving or slow dynamics,
which adversely affects the existing method based on the Kullback-Leibler
divergence ($\alpha = 0$). In particular, the choice of $\alpha = -0.5$ tends
to yield the optimal results. To corroborate our findings, we present an
exactly solvable simplification of the EP estimation problem, whose loss
function landscape and stochastic properties give deeper intuition into the
robustness of the $\alpha$-NEEP.
( 2
min )
This paper presents a new type of hybrid model for Bayesian optimization (BO)
adept at managing mixed variables, encompassing both quantitative (continuous
and integer) and qualitative (categorical) types. Our proposed new hybrid
models (named hybridM) merge the Monte Carlo Tree Search structure (MCTS) for
categorical variables with Gaussian Processes (GP) for continuous ones. hybridM
leverages the upper confidence bound tree search (UCTS) for MCTS strategy,
showcasing the tree architecture's integration into Bayesian optimization. Our
innovations, including dynamic online kernel selection in the surrogate
modeling phase and a unique UCTS search strategy, position our hybrid models as
an advancement in mixed-variable surrogate models. Numerical experiments
underscore the superiority of hybrid models, highlighting their potential in
Bayesian optimization.
( 2
min )
Computational modeling of artwork meaning is complex and difficult. This is
because art interpretation is multidimensional and highly subjective. This
paper experimentally investigated the degree to which a state-of-the-art Deep
Convolutional Neural Network (DCNN), a popular Machine Learning approach, can
correctly distinguish modern conceptual art work into the galleries devised by
art curators. Two hypotheses were proposed to state that the DCNN model uses
Exhibited Properties for classification, like shape and color, but not
Non-Exhibited Properties, such as historical context and artist intention. The
two hypotheses were experimentally validated using a methodology designed for
this purpose. VGG-11 DCNN pre-trained on ImageNet dataset and discriminatively
fine-tuned was trained on handcrafted datasets designed from real-world
conceptual photography galleries. Experimental results supported the two
hypotheses showing that the DCNN model ignores Non-Exhibited Properties and
uses only Exhibited Properties for artwork classification. This work points to
current DCNN limitations, which should be addressed by future DNN models.
( 2
min )
Test log-likelihood is commonly used to compare different models of the same
data or different approximate inference algorithms for fitting the same
probabilistic model. We present simple examples demonstrating how comparisons
based on test log-likelihood can contradict comparisons according to other
objectives. Specifically, our examples show that (i) approximate Bayesian
inference algorithms that attain higher test log-likelihoods need not also
yield more accurate posterior approximations and (ii) conclusions about
forecast accuracy based on test log-likelihood comparisons may not agree with
conclusions based on root mean squared error.
( 2
min )
In data-driven control and machine learning, a common requirement involves
breaking down large matrices into smaller, low-rank factors that possess
specific levels of sparsity. This paper introduces an innovative solution to
the orthogonal nonnegative matrix factorization (ONMF) problem. The objective
is to approximate input data by using two low-rank nonnegative matrices,
adhering to both orthogonality and $\ell_0$-norm sparsity constraints. the
proposed maximum-entropy-principle based framework ensures orthogonality and
sparsity of features or the mixing matrix, while maintaining nonnegativity in
both. Additionally, the methodology offers a quantitative determination of the
``true'' number of underlying features, a crucial hyperparameter for ONMF.
Experimental evaluation on synthetic and a standard datasets highlights the
method's superiority in terms of sparsity, orthogonality, and computational
speed compared to existing approaches. Notably, the proposed method achieves
comparable or improved reconstruction errors in line with the literature.
( 2
min )
Multimodal sentiment analysis aims to identify the emotions expressed by
individuals through visual, language, and acoustic cues. However, most of the
existing research efforts assume that all modalities are available during both
training and testing, making their algorithms susceptible to the missing
modality scenario. In this paper, we propose a novel knowledge-transfer network
to translate between different modalities to reconstruct the missing audio
modalities. Moreover, we develop a cross-modality attention mechanism to retain
the maximal information of the reconstructed and observed modalities for
sentiment prediction. Extensive experiments on three publicly available
datasets demonstrate significant improvements over baselines and achieve
comparable results to the previous methods with complete multi-modality
supervision.
( 2
min )
Given the success of ChatGPT, LaMDA and other large language models (LLMs),
there has been an increase in development and usage of LLMs within the
technology sector and other sectors. While the level in which LLMs has not
reached a level where it has surpassed human intelligence, there will be a time
when it will. Such LLMs can be referred to as advanced LLMs. Currently, there
are limited usage of ethical artificial intelligence (AI) principles and
guidelines addressing advanced LLMs due to the fact that we have not reached
that point yet. However, this is a problem as once we do reach that point, we
will not be adequately prepared to deal with the aftermath of it in an ethical
and optimal way, which will lead to undesired and unexpected consequences. This
paper addresses this issue by discussing what ethical AI principles and
guidelines can be used to address highly advanced LLMs.
( 2
min )
Computational efficiency and adversarial robustness are critical factors in
real-world engineering applications. Yet, conventional neural networks often
fall short in addressing both simultaneously, or even separately. Drawing
insights from natural physical systems and existing literature, it is known
that an input convex architecture enhances computational efficiency, while a
Lipschitz-constrained architecture bolsters adversarial robustness. By
leveraging the strengths of convexity and Lipschitz continuity, we develop a
novel network architecture, termed Input Convex Lipschitz Recurrent Neural
Networks. This model outperforms existing recurrent units across a spectrum of
engineering tasks in terms of computational efficiency and adversarial
robustness. These tasks encompass a benchmark MNIST image classification,
real-world solar irradiance prediction for Solar PV system planning at LHT
Holdings in Singapore, and real-time Model Predictive Control optimization for
a chemical reactor.
( 2
min )
Utilizing task-invariant prior knowledge extracted from related tasks,
meta-learning is a principled framework that empowers learning a new task
especially when data records are limited. A fundamental challenge in
meta-learning is how to quickly "adapt" the extracted prior in order to train a
task-specific model within a few optimization steps. Existing approaches deal
with this challenge using a preconditioner that enhances convergence of the
per-task training process. Though effective in representing locally a quadratic
training loss, these simple linear preconditioners can hardly capture complex
loss geometries. The present contribution addresses this limitation by learning
a nonlinear mirror map, which induces a versatile distance metric to enable
capturing and optimizing a wide range of loss geometries, hence facilitating
the per-task training. Numerical tests on few-shot learning datasets
demonstrate the superior expressiveness and convergence of the advocated
approach.
( 2
min )
Image registration has traditionally been done using two distinct approaches:
learning based methods, relying on robust deep neural networks, and
optimization-based methods, applying complex mathematical transformations to
warp images accordingly. Of course, both paradigms offer advantages and
disadvantages, and, in this work, we seek to combine their respective strengths
into a single streamlined framework, using the outputs of the learning based
method as initial parameters for optimization while prioritizing computational
power for the image pairs that offer the greatest loss. Our investigations
showed improvements of up to 1.6% in test data, while maintaining the same
inference time, and a substantial 1.0% points performance gain in deformation
field smoothness.
( 2
min )
Transformer-based models excel in speech recognition. Existing efforts to
optimize Transformer inference, typically for long-context applications, center
on simplifying attention score calculations. However, streaming speech
recognition models usually process a limited number of tokens each time, making
attention score calculation less of a bottleneck. Instead, the bottleneck lies
in the linear projection layers of multi-head attention and feedforward
networks, constituting a substantial portion of the model size and contributing
significantly to computation, memory, and power usage.
To address this bottleneck, we propose folding attention, a technique
targeting these linear layers, significantly reducing model size and improving
memory and power efficiency. Experiments on on-device Transformer-based
streaming speech recognition models show that folding attention reduces model
size (and corresponding memory consumption) by up to 24% and power consumption
by up to 23%, all without compromising model accuracy or computation overhead.
( 2
min )
Stochastic generators are useful for estimating climate impacts on various
sectors. Projecting climate risk in various sectors, e.g. energy systems,
requires generators that are accurate (statistical resemblance to
ground-truth), reliable (do not produce erroneous examples), and efficient.
Leveraging data from the North American Land Data Assimilation System, we
introduce TemperatureGAN, a Generative Adversarial Network conditioned on
months, locations, and time periods, to generate 2m above ground atmospheric
temperatures at an hourly resolution. We propose evaluation methods and metrics
to measure the quality of generated samples. We show that TemperatureGAN
produces high-fidelity examples with good spatial representation and temporal
dynamics consistent with known diurnal cycles.
( 2
min )
We use explainable neural networks to connect the evolutionary history of
dark matter halos with their density profiles. The network captures independent
factors of variation in the density profiles within a low-dimensional
representation, which we physically interpret using mutual information. Without
any prior knowledge of the halos' evolution, the network recovers the known
relation between the early time assembly and the inner profile, and discovers
that the profile beyond the virial radius is described by a single parameter
capturing the most recent mass accretion rate. The results illustrate the
potential for machine-assisted scientific discovery in complicated
astrophysical datasets.
( 2
min )
Pseudorange errors are the root cause of localization inaccuracy in GPS.
Previous data-driven methods regress and eliminate pseudorange errors using
handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS
localization framework, E2E-PrNet, to train a neural network for pseudorange
correction (PrNet) directly using the final task loss calculated with the
ground truth of GPS receiver states. The gradients of the loss with respect to
learnable parameters are backpropagated through a differentiable nonlinear
least squares optimizer to PrNet. The feasibility is verified with GPS data
collected by Android phones, showing that E2E-PrNet outperforms the
state-of-the-art end-to-end GPS localization methods.
( 2
min )
While colonization has sociohistorically impacted people's identities across
various dimensions, those colonial values and biases continue to be perpetuated
by sociotechnical systems. One category of sociotechnical systems--sentiment
analysis tools--can also perpetuate colonial values and bias, yet less
attention has been paid to how such tools may be complicit in perpetuating
coloniality, although they are often used to guide various practices (e.g.,
content moderation). In this paper, we explore potential bias in sentiment
analysis tools in the context of Bengali communities that have experienced and
continue to experience the impacts of colonialism. Drawing on identity
categories most impacted by colonialism amongst local Bengali communities, we
focused our analytic attention on gender, religion, and nationality. We
conducted an algorithmic audit of all sentiment analysis tools for Bengali,
available on the Python package index (PyPI) and GitHub. Despite similar
semantic content and structure, our analyses showed that in addition to
inconsistencies in output from different tools, Bengali sentiment analysis
tools exhibit bias between different identity categories and respond
differently to different ways of identity expression. Connecting our findings
with colonially shaped sociocultural structures of Bengali communities, we
discuss the implications of downstream bias of sentiment analysis tools.
( 3
min )
This paper investigates the double descent phenomenon in two-layer neural
networks, focusing on the role of L1 regularization and representation
dimensions. It explores an alternative double descent phenomenon, named sparse
double descent. The study emphasizes the complex relationship between model
complexity, sparsity, and generalization, and suggests further research into
more diverse models and datasets. The findings contribute to a deeper
understanding of neural network training and optimization.
( 2
min )
Black-box query-based attacks constitute significant threats to Machine
Learning as a Service (MLaaS) systems since they can generate adversarial
examples without accessing the target model's architecture and parameters.
Traditional defense mechanisms, such as adversarial training, gradient masking,
and input transformations, either impose substantial computational costs or
compromise the test accuracy of non-adversarial inputs. To address these
challenges, we propose an efficient defense mechanism, PuriDefense, that
employs random patch-wise purifications with an ensemble of lightweight
purification models at a low level of inference cost. These models leverage the
local implicit function and rebuild the natural image manifold. Our theoretical
analysis suggests that this approach slows down the convergence of query-based
attacks by incorporating randomness into purifications. Extensive experiments
on CIFAR-10 and ImageNet validate the effectiveness of our proposed
purifier-based defense mechanism, demonstrating significant improvements in
robustness against query-based attacks.
( 2
min )
With the steady rise of the use of AI in bio-technical applications and the
widespread adoption of genomics sequencing, an increasing amount of AI-based
algorithms and tools is entering the research and production stage affecting
critical decision-making streams like drug discovery and clinical outcomes.
This paper demonstrates the vulnerability of AI models often utilized
downstream tasks on recognized public genomics datasets. We undermine model
robustness by deploying an attack that focuses on input transformation while
mimicking the real data and confusing the model decision-making, ultimately
yielding a pronounced deterioration in model performance. Further, we enhance
our approach by generating poisoned data using a variational autoencoder-based
model. Our empirical findings unequivocally demonstrate a decline in model
performance, underscored by diminished accuracy and an upswing in false
positives and false negatives. Furthermore, we analyze the resulting
adversarial samples via spectral analysis yielding conclusions for
countermeasures against such attacks.
( 2
min )
Fair machine learning aims to prevent discrimination against individuals or
sub-populations based on sensitive attributes such as gender and race. In
recent years, causal inference methods have been increasingly used in fair
machine learning to measure unfairness by causal effects. However, current
methods assume that the true causal graph is given, which is often not true in
real-world applications. To address this limitation, this paper proposes a
framework for achieving causal fairness based on the notion of interventions
when the true causal graph is partially known. The proposed approach involves
modeling fair prediction using a Partially Directed Acyclic Graph (PDAG),
specifically, a class of causal DAGs that can be learned from observational
data combined with domain knowledge. The PDAG is used to measure causal
fairness, and a constrained optimization problem is formulated to balance
between fairness and accuracy. Results on both simulated and real-world
datasets demonstrate the effectiveness of this method.
( 2
min )
Neural network with quadratic decision functions have been introduced as
alternatives to standard neural networks with affine linear one. They are
advantageous when the objects to be identified are of compact basic geometries
like circles, ellipsis etc. In this paper we investigate the use of such ansatz
functions for classification. In particular we test and compare the algorithm
on the MNIST dataset for classification of handwritten digits and for
classification of subspecies. We also show, that the implementation can be
based on the neural network structure in the software Tensorflow and Keras,
respectively.
( 2
min )
This paper discusses the feasibility of using Large Language Models LLM for
code generation with a particular application in designing an RISC. The paper
also reviews the associated steps such as parsing, tokenization, encoding,
attention mechanism, sampling the tokens and iterations during code generation.
The generated code for the RISC components is verified through testbenches and
hardware implementation on a FPGA board. Four metric parameters Correct output
on the first iteration, Number of errors embedded in the code, Number of trials
required to achieve the code and Failure to generate the code after three
iterations, are used to compare the efficiency of using LLM in programming. In
all the cases, the generated code had significant errors and human intervention
was always required to fix the bugs. LLM can therefore be used to complement a
programmer code design.
( 2
min )
The use of low-rank adaptation (LoRA) with frozen pretrained language models
(PLMs) has become increasing popular as a mainstream, resource-efficient
modeling approach for memory-constrained hardware. In this study, we first
explore how to enhance model performance by introducing various LoRA training
strategies, achieving relative word error rate reductions of 3.50\% on the
public Librispeech dataset and of 3.67\% on an internal dataset in the
messaging domain. To further characterize the stability of LoRA-based
second-pass speech recognition models, we examine robustness against input
perturbations. These perturbations are rooted in homophone replacements and a
novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both
designed to measure the relative degradation in the performance of rescoring
models. Our experimental results indicate that while advanced variants of LoRA,
such as dynamic rank-allocated LoRA, lead to performance degradation in
$1$-best perturbation, they alleviate the degradation in $N$-best perturbation.
This finding is in comparison to fully-tuned models and vanilla LoRA tuning
baselines, suggesting that a comprehensive selection is needed when using
LoRA-based adaptation for compute-cost savings and robust language modeling.
( 3
min )
Deep learning still has drawbacks in terms of trustworthiness, which
describes a comprehensible, fair, safe, and reliable method. To mitigate the
potential risk of AI, clear obligations associated to trustworthiness have been
proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a
central question is to what extent trustworthy deep learning can be realized.
Establishing the described properties constituting trustworthiness requires
that the factors influencing an algorithmic computation can be retraced, i.e.,
the algorithmic implementation is transparent. Motivated by the observation
that the current evolution of deep learning models necessitates a change in
computing technology, we derive a mathematical framework which enables us to
analyze whether a transparent implementation in a computing model is feasible.
We exemplarily apply our trustworthiness framework to analyze deep learning
approaches for inverse problems in digital and analog computing models
represented by Turing and Blum-Shub-Smale Machines, respectively. Based on
previous results, we find that Blum-Shub-Smale Machines have the potential to
establish trustworthy solvers for inverse problems under fairly general
conditions, whereas Turing machines cannot guarantee trustworthiness to the
same degree.
( 2
min )
In this paper, we formulate the multi-agent graph bandit problem as a
multi-agent extension of the graph bandit problem introduced by Zhang,
Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative
agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each
node, agents observe a random reward drawn from a node-dependent probability
distribution. The reward of the system is modeled as a weighted sum of the
rewards the agents observe, where the weights capture the decreasing marginal
reward associated with multiple agents sampling the same node at the same time.
We propose an Upper Confidence Bound (UCB)-based learning algorithm,
Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by
$O(N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$. Lastly,
we numerically test our algorithm by comparing it to alternative methods.
( 2
min )
Geometric quantum machine learning based on equivariant quantum neural
networks (EQNN) recently appeared as a promising direction in quantum machine
learning. Despite the encouraging progress, the studies are still limited to
theory, and the role of hardware noise in EQNN training has never been
explored. This work studies the behavior of EQNN models in the presence of
noise. We show that certain EQNN models can preserve equivariance under Pauli
channels, while this is not possible under the amplitude damping channel. We
claim that the symmetry breaking grows linearly in the number of layers and
noise strength. We support our claims with numerical data from simulations as
well as hardware up to 64 qubits. Furthermore, we provide strategies to enhance
the symmetry protection of EQNN models in the presence of noise.
( 2
min )
The lack of anomaly detection methods during mechanized tunnelling can cause
financial loss and deficits in drilling time. On-site excavation requires hard
obstacles to be recognized prior to drilling in order to avoid damaging the
tunnel boring machine and to adjust the propagation velocity. The efficiency of
the structural anomaly detection can be increased with intelligent optimization
techniques and machine learning. In this research, the anomaly in a simple
structure is detected by comparing the experimental measurements of the
structural vibrations with numerical simulations using parameter estimation
methods.
( 2
min )
Adversarial attacks on learning-based trajectory predictors have already been
demonstrated. However, there are still open questions about the effects of
perturbations on trajectory predictor inputs other than state histories, and
how these attacks impact downstream planning and control. In this paper, we
conduct a sensitivity analysis on two trajectory prediction models,
Trajectron++ and AgentFormer. We observe that between all inputs, almost all of
the perturbation sensitivities for Trajectron++ lie only within the most recent
state history time point, while perturbation sensitivities for AgentFormer are
spread across state histories over time. We additionally demonstrate that,
despite dominant sensitivity on state history perturbations, an undetectable
image map perturbation made with the Fast Gradient Sign Method can induce large
prediction error increases in both models. Even though image maps may
contribute slightly to the prediction output of both models, this result
reveals that rather than being robust to adversarial image perturbations,
trajectory predictors are susceptible to image attacks. Using an
optimization-based planner and example perturbations crafted from sensitivity
results, we show how this vulnerability can cause a vehicle to come to a sudden
stop from moderate driving speeds.
( 2
min )
We introduce a cryptographic method to hide an arbitrary secret payload in
the response of a Large Language Model (LLM). A secret key is required to
extract the payload from the model's response, and without the key it is
provably impossible to distinguish between the responses of the original LLM
and the LLM that hides a payload. In particular, the quality of generated text
is not affected by the payload. Our approach extends a recent result of Christ,
Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for
LLMs.
( 2
min )
Tactics, Techniques and Procedures (TTPs) represent sophisticated attack
patterns in the cybersecurity domain, described encyclopedically in textual
knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP
mapping, is an important and challenging task. Conventional learning approaches
often target the problem in the classical multi-class or multilabel
classification setting. This setting hinders the learning ability of the model
due to a large number of classes (i.e., TTPs), the inevitable skewness of the
label distribution and the complex hierarchical structure of the label space.
We formulate the problem in a different learning paradigm, where the assignment
of a text to a TTP label is decided by the direct semantic similarity between
the two, thus reducing the complexity of competing solely over the large
labeling space. To that end, we propose a neural matching architecture with an
effective sampling-based learn-to-compare mechanism, facilitating the learning
process of the matching model despite constrained resources.
( 2
min )
Malicious adversaries can attack machine learning models to infer sensitive
information or damage the system by launching a series of evasion attacks.
Although various work addresses privacy and security concerns, they focus on
individual defenses, but in practice, models may undergo simultaneous attacks.
This study explores the combination of adversarial training and differentially
private training to defend against simultaneous attacks. While
differentially-private adversarial training, as presented in DP-Adv,
outperforms the other state-of-the-art methods in performance, it lacks formal
privacy guarantees and empirical validation. Thus, in this work, we benchmark
the performance of this technique using a membership inference attack and
empirically show that the resulting approach is as private as non-robust
private models. This work also highlights the need to explore privacy
guarantees in dynamic training paradigms.
( 2
min )
We give a procedure for computing group-level $(\epsilon, \delta)$-DP
guarantees for DP-SGD, when using Poisson sampling or fixed batch size
sampling. Up to discretization errors in the implementation, the DP guarantees
computed by this procedure are tight (assuming we release every intermediate
iterate).
( 2
min )
Neural networks have been employed for a wide range of processing
applications like image processing, motor control, object detection and many
others. Living neural networks offer advantages of lower power consumption,
faster processing, and biological realism. Optogenetics offers high spatial and
temporal control over biological neurons and presents potential in training
live neural networks. This work proposes a simulated living neural network
trained indirectly by backpropagating STDP based algorithms using precision
activation by optogenetics achieving accuracy comparable to traditional neural
network training algorithms.
( 2
min )
Finding accurate solutions to the electronic Schr\"odinger equation plays an
important role in discovering important molecular and material energies and
characteristics. Consequently, solving systems with large numbers of electrons
has become increasingly important. Variational Monte Carlo (VMC) methods,
especially those approximated through deep neural networks, are promising in
this regard. In this paper, we aim to integrate one such model called the
FermiNet, a post-Hartree-Fock (HF) Deep Neural Network (DNN) model, into a
standard and widely used open source library, DeepChem. We also propose novel
initialization techniques to overcome the difficulties associated with the
assignment of excess or lack of electrons for ions.
( 2
min )
In this paper, we develop a deep learning-based bandwidth allocation policy
that is: 1) scalable with the number of users and 2) transferable to different
communication scenarios, such as non-stationary wireless channels, different
quality-of-service (QoS) requirements, and dynamically available resources. To
support scalability, the bandwidth allocation policy is represented by a graph
neural network (GNN), with which the number of training parameters does not
change with the number of users. To enable the generalization of the GNN, we
develop a hybrid-task meta-learning (HML) algorithm that trains the initial
parameters of the GNN with different communication scenarios during
meta-training. Next, during meta-testing, a few samples are used to fine-tune
the GNN with unseen communication scenarios. Simulation results demonstrate
that our HML approach can improve the initial performance by $8.79\%$, and
sampling efficiency by $73\%$, compared with existing benchmarks. After
fine-tuning, our near-optimal GNN-based policy can achieve close to the same
reward with much lower inference complexity compared to the optimal policy
obtained using iterative optimization.
( 2
min )
Condition monitoring plays a significant role in the safety and reliability
of modern industrial systems. Artificial intelligence (AI) approaches are
gaining attention from academia and industry as a growing subject in industrial
applications and as a powerful way of identifying faults. This paper provides
an overview of intelligent condition monitoring and fault detection and
diagnosis methods for industrial plants with a focus on the open-source
benchmark Tennessee Eastman Process (TEP). In this survey, the most popular and
state-of-the-art deep learning (DL) and machine learning (ML) algorithms for
industrial plant condition monitoring, fault detection, and diagnosis are
summarized and the advantages and disadvantages of each algorithm are studied.
Challenges like imbalanced data, unlabelled samples and how deep learning
models can handle them are also covered. Finally, a comparison of the
accuracies and specifications of different algorithms utilizing the Tennessee
Eastman Process (TEP) is conducted. This research will be beneficial for both
researchers who are new to the field and experts, as it covers the literature
on condition monitoring and state-of-the-art methods alongside the challenges
and possible solutions to them.
( 2
min )
Stochastic generators are useful for estimating climate impacts on various
sectors. Projecting climate risk in various sectors, e.g. energy systems,
requires generators that are accurate (statistical resemblance to
ground-truth), reliable (do not produce erroneous examples), and efficient.
Leveraging data from the North American Land Data Assimilation System, we
introduce TemperatureGAN, a Generative Adversarial Network conditioned on
months, locations, and time periods, to generate 2m above ground atmospheric
temperatures at an hourly resolution. We propose evaluation methods and metrics
to measure the quality of generated samples. We show that TemperatureGAN
produces high-fidelity examples with good spatial representation and temporal
dynamics consistent with known diurnal cycles.
( 2
min )
Recent years have seen a surge of interest in the algorithmic estimation of
stochastic entropy production (EP) from trajectory data via machine learning. A
crucial element of such algorithms is the identification of a loss function
whose minimization guarantees the accurate EP estimation. In this study, we
show that there exists a host of loss functions, namely those implementing a
variational representation of the $\alpha$-divergence, which can be used for
the EP estimation. By fixing $\alpha$ to a value between $-1$ and $0$, the
$\alpha$-NEEP (Neural Estimator for Entropy Production) exhibits a much more
robust performance against strong nonequilibrium driving or slow dynamics,
which adversely affects the existing method based on the Kullback-Leibler
divergence ($\alpha = 0$). In particular, the choice of $\alpha = -0.5$ tends
to yield the optimal results. To corroborate our findings, we present an
exactly solvable simplification of the EP estimation problem, whose loss
function landscape and stochastic properties give deeper intuition into the
robustness of the $\alpha$-NEEP.
( 2
min )
This paper presents a new type of hybrid model for Bayesian optimization (BO)
adept at managing mixed variables, encompassing both quantitative (continuous
and integer) and qualitative (categorical) types. Our proposed new hybrid
models (named hybridM) merge the Monte Carlo Tree Search structure (MCTS) for
categorical variables with Gaussian Processes (GP) for continuous ones. hybridM
leverages the upper confidence bound tree search (UCTS) for MCTS strategy,
showcasing the tree architecture's integration into Bayesian optimization. Our
innovations, including dynamic online kernel selection in the surrogate
modeling phase and a unique UCTS search strategy, position our hybrid models as
an advancement in mixed-variable surrogate models. Numerical experiments
underscore the superiority of hybrid models, highlighting their potential in
Bayesian optimization.
( 2
min )
Test log-likelihood is commonly used to compare different models of the same
data or different approximate inference algorithms for fitting the same
probabilistic model. We present simple examples demonstrating how comparisons
based on test log-likelihood can contradict comparisons according to other
objectives. Specifically, our examples show that (i) approximate Bayesian
inference algorithms that attain higher test log-likelihoods need not also
yield more accurate posterior approximations and (ii) conclusions about
forecast accuracy based on test log-likelihood comparisons may not agree with
conclusions based on root mean squared error.
( 2
min )
In this paper, we formulate the multi-agent graph bandit problem as a
multi-agent extension of the graph bandit problem introduced by Zhang,
Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative
agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each
node, agents observe a random reward drawn from a node-dependent probability
distribution. The reward of the system is modeled as a weighted sum of the
rewards the agents observe, where the weights capture the decreasing marginal
reward associated with multiple agents sampling the same node at the same time.
We propose an Upper Confidence Bound (UCB)-based learning algorithm,
Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by
$O(N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$. Lastly,
we numerically test our algorithm by comparing it to alternative methods.
( 2
min )
Seven years ago, an unexpected nationwide shortage of radiologists was triggered by a single statement from Professor Geoffrey Hinton. The statement was:“I think if you work as a radiologist, you are like the Wilie E Coyote in the cartoon. You are already over the edge of the cliff, but you have not looked down yet.… Read More »The AI radiologists replacement saga: Don’t be misled by the scaremongering – science v.s. science fiction
The post The AI radiologists replacement saga: Don’t be misled by the scaremongering – science v.s. science fiction appeared first on Data Science Central.
( 23
min )
In a technology of rapid digital transformation, leveraging records analytics and collaborative tools may be a sport changer. One such integration that is proving to be impactful is that of data analytics with Slack. This effective merger provides teams with the capability to engage and make selections based totally on actual-time insights, in the long… Read More »Unlocking team productivity: Integrating data analytics into your Slack workflow
The post Unlocking team productivity: Integrating data analytics into your Slack workflow appeared first on Data Science Central.
( 21
min )
Amazon Textract is a machine learning (ML) service that enables automatic extraction of text, handwriting, and data from scanned documents, surpassing traditional optical character recognition (OCR). It can identify, understand, and extract data from tables and forms with remarkable accuracy. Presently, several companies rely on manual extraction methods or basic OCR software, which is tedious […]
( 7
min )
Neural construction models have shown promising performance for Vehicle
Routing Problems (VRPs) by adopting either the Autoregressive (AR) or
Non-Autoregressive (NAR) learning approach. While AR models produce
high-quality solutions, they generally have a high inference latency due to
their sequential generation nature. Conversely, NAR models generate solutions
in parallel with a low inference latency but generally exhibit inferior
performance. In this paper, we propose a generic Guided Non-Autoregressive
Knowledge Distillation (GNARKD) method to obtain high-performance NAR models
having a low inference latency. GNARKD removes the constraint of sequential
generation in AR models while preserving the learned pivotal components in the
network architecture to obtain the corresponding NAR models through knowledge
distillation. We evaluate GNARKD by applying it to three widely adopted AR
models to obtain NAR VRP solvers for both synthesized and real-world instances.
The experimental results demonstrate that GNARKD significantly reduces the
inference time (4-5 times faster) with acceptable performance drop (2-3\%). To
the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP
solvers from AR ones through knowledge distillation.
( 3
min )
Exponential families are statistical models which are the workhorses in
statistics, information theory, and machine learning among others. An
exponential family can either be normalized subtractively by its cumulant or
free energy function or equivalently normalized divisively by its partition
function. Both subtractive and divisive normalizers are strictly convex and
smooth functions inducing pairs of Bregman and Jensen divergences. It is
well-known that skewed Bhattacharryya distances between probability densities
of an exponential family amounts to skewed Jensen divergences induced by the
cumulant function between their corresponding natural parameters, and in limit
cases that the sided Kullback-Leibler divergences amount to reverse-sided
Bregman divergences. In this paper, we first show that the $\alpha$-divergences
between unnormalized densities of an exponential family amounts to scaled
$\alpha$-skewed Jensen divergences induced by the partition function. We then
show how comparative convexity with respect to a pair of quasi-arithmetic means
allows to deform both convex functions and their arguments, and thereby define
dually flat spaces with corresponding divergences when ordinary convexity is
preserved.
( 2
min )
This paper presents the computational challenge on topological deep learning
that was hosted within the ICML 2023 Workshop on Topology and Geometry in
Machine Learning. The competition asked participants to provide open-source
implementations of topological neural networks from the literature by
contributing to the python packages TopoNetX (data processing) and TopoModelX
(deep learning). The challenge attracted twenty-eight qualifying submissions in
its two-month duration. This paper describes the design of the challenge and
summarizes its main findings.
( 2
min )
As large language models (LLMs) like ChatGPT have gained traction, an
increasing number of news websites have begun utilizing them to generate
articles. However, not only can these language models produce factually
inaccurate articles on reputable websites but disreputable news sites can
utilize LLMs to mass produce misinformation. To begin to understand this
phenomenon, we present one of the first large-scale studies of the prevalence
of synthetic articles within online news media. To do this, we train a
DeBERTa-based synthetic news detector and classify over 15.90 million articles
from 3,074 misinformation and mainstream news websites. We find that between
January 1, 2022, and May 1, 2023, the relative number of synthetic news
articles increased by 55.4% on mainstream websites while increasing by 457% on
misinformation sites. We find that this increase is largely driven by smaller
less popular websites. Analyzing the impact of the release of ChatGPT using an
interrupted-time-series, we show that while its release resulted in a marked
increase in synthetic articles on small sites as well as misinformation news
websites, there was not a corresponding increase on large mainstream news
websites.
( 3
min )
2024 promises to be a breakout year for Generative AI (GenAI) and AI. However, there are two challenges that organizations will face in 2024 to “leverage AI to get value from their data.” Challenge #1: Too much focus is on “implementing AI” and not enough on gaining organizational alignment regarding where and how value will… Read More »GenAI: Beware the Productivity Trap; It’s About Cultural Empowerment – Part 3
The post GenAI: Beware the Productivity Trap; It’s About Cultural Empowerment – Part 3 appeared first on Data Science Central.
( 22
min )
AutoML platforms have numerous options for the algorithms to try for each
step of the analysis, i.e., different possible algorithms for imputation,
transformations, feature selection, and modelling. Finding the optimal
combination of algorithms and hyper-parameter values is computationally
expensive, as the number of combinations to explore leads to an exponential
explosion of the space. In this paper, we present the Sequential
Hyper-parameter Space Reduction (SHSR) algorithm that reduces the space for an
AutoML tool with negligible drop in its predictive performance. SHSR is a
meta-level learning algorithm that analyzes past runs of an AutoML tool on
several datasets and learns which hyper-parameter values to filter out from
consideration on a new dataset to analyze. SHSR is evaluated on 284
classification and 375 regression problems, showing an approximate 30%
reduction in execution time with a performance drop of less than 0.1%.
( 2
min )
Privacy-utility tradeoff remains as one of the fundamental issues of
differentially private machine learning. This paper introduces a geometrically
inspired kernel-based approach to mitigate the accuracy-loss issue in
classification. In this approach, a representation of the affine hull of given
data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads
to a novel distance measure that hides privacy-sensitive information about
individual data points and improves the privacy-utility tradeoff via
significantly reducing the risk of membership inference attacks. The
effectiveness of the approach is demonstrated through experiments on MNIST
dataset, Freiburg groceries dataset, and a real biomedical dataset. It is
verified that the approach remains computationally practical. The application
of the approach to federated learning is considered and it is observed that the
accuracy-loss due to data being distributed is either marginal or not
significantly high.
( 2
min )
Out-Of-Distribution (OOD) generalization is an essential topic in machine
learning. However, recent research is only focusing on the corresponding
methods for neural networks. This paper introduces a novel and effective
solution for OOD generalization of decision tree models, named Invariant
Decision Tree (IDT). IDT enforces a penalty term with regard to the
unstable/varying behavior of a split across different environments during the
growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is
constructed. Our proposed method is motivated by a theoretical result under
mild conditions, and validated by numerical tests with both synthetic and real
datasets. The superior performance compared to non-OOD tree models implies that
considering OOD generalization for tree models is absolutely necessary and
should be given more attention.
( 2
min )
We introduce a novel computational unit for neural networks that features
multiple biases, challenging the traditional perceptron structure. This unit
emphasizes the importance of preserving uncorrupted information as it is passed
from one unit to the next, applying activation functions later in the process
with specialized biases for each unit. Through both empirical and theoretical
analyses, we show that by focusing on increasing biases rather than weights,
there is potential for significant enhancement in a neural network model's
performance. This approach offers an alternative perspective on optimizing
information flow within neural networks. See source code at
https://github.com/CuriosAI/dac-dev.
( 2
min )
Bayesian Optimization (BO) is typically used to optimize an unknown function
$f$ that is noisy and costly to evaluate, by exploiting an acquisition function
that must be maximized at each optimization step. Even if provably
asymptotically optimal BO algorithms are efficient at optimizing
low-dimensional functions, scaling them to high-dimensional spaces remains an
open problem, often tackled by assuming an additive structure for $f$. By doing
so, BO algorithms typically introduce additional restrictive assumptions on the
additive structure that reduce their applicability domain. This paper contains
two main contributions: (i) we relax the restrictive assumptions on the
additive structure of $f$ without weakening the maximization guarantees of the
acquisition function, and (ii) we address the over-exploration problem for
decentralized BO algorithms. To these ends, we propose DuMBO, an asymptotically
optimal decentralized BO algorithm that achieves very competitive performance
against state-of-the-art BO algorithms, especially when the additive structure
of $f$ comprises high-dimensional factors.
( 2
min )
Real-time and accurate traffic flow prediction is the foundation for ensuring
the efficient operation of intelligent transportation systems.In existing
traffic flow prediction methods based on graph neural networks (GNNs),
pre-defined graphs were usually used to describe the spatial correlations of
different traffic nodes in urban road networks. However, the ability of
pre-defined graphs used to describe spatial correlation was limited by prior
knowledge and graph generation methods. Although time-varying graphs based on
data-driven learning can partially overcome the drawbacks of pre-defined
graphs, the learning ability of existing adaptive graphs was limited. For
example, time-varying graphs cannot adequately capture the inherent spatial
correlations in traffic flow data.In order to solve these problems, we have
proposed a hybrid time-varying graph neural network (HTVGNN) for traffic flow
prediction.
( 2
min )
Neural network wavefunctions optimized using the variational Monte Carlo
method have been shown to produce highly accurate results for the electronic
structure of atoms and small molecules, but the high cost of optimizing such
wavefunctions prevents their application to larger systems. We propose the
Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to
reduce this bottleneck. SPRING combines ideas from the recently introduced
minimum-step stochastic reconfiguration optimizer (MinSR) and the classical
randomized Kaczmarz method for solving linear least-squares problems. We
demonstrate that SPRING outperforms both MinSR and the popular
Kronecker-Factored Approximate Curvature method (KFAC) across a number of small
atoms and molecules, given that the learning rates of all methods are optimally
tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after
forty thousand training iterations, whereas both MinSR and KFAC fail to do so
even after one hundred thousand iterations.
( 2
min )
Generating explanations for reinforcement learning (RL) is challenging as
actions may produce long-term effects on the future. In this paper, we develop
a novel framework for explainable RL by learning a causal world model without
prior knowledge of the causal structure of the environment. The model captures
the influence of actions, allowing us to interpret the long-term effects of
actions through causal chains, which present how actions influence
environmental variables and finally lead to rewards. Different from most
explanatory models which suffer from low accuracy, our model remains accurate
while improving explainability, making it applicable in model-based learning.
As a result, we demonstrate that our causal model can serve as the bridge
between explainability and learning.
( 2
min )
This paper examines some common problems in Human-Robot Interaction (HRI)
causing failures and troubles in Chat. A given use case's design decisions
start with the suitable robot, the suitable chatting model, identifying common
problems that cause failures, identifying potential solutions, and planning
continuous improvement. In conclusion, it is recommended to use a closed-loop
control algorithm that guides the use of trained Artificial Intelligence (AI)
pre-trained models and provides vocabulary filtering, re-train batched models
on new datasets, learn online from data streams, and/or use reinforcement
learning models to self-update the trained models and reduce errors.
( 2
min )
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. The classic WDRO estimator is asymptotically
biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting
in a smaller asymptotic mean squared error. Meanwhile, the proposed adjusted
WDRO has an out-of-sample performance guarantee. Further, under certain
conditions, our proposed adjustment technique provides a general principle to
de-bias asymptotically biased estimators. Specifically, we will investigate how
the adjusted WDRO estimator is developed in the generalized linear model,
including logistic regression, linear regression, and Poisson regression.
Numerical experiments demonstrate the favorable practical performance of the
adjusted estimator over the classic one.
( 2
min )
Protein post-translational modification (PTM) site prediction is a
fundamental task in bioinformatics. Several computational methods have been
developed to predict PTM sites. However, existing methods ignore the structure
information and merely utilize protein sequences. Furthermore, designing a more
fine-grained structure representation learning method is urgently needed as PTM
is a biological event that occurs at the atom granularity. In this paper, we
propose a PTM site prediction method by Coupling of Multi-Granularity structure
and Multi-Scale sequence representation, PTM-CMGMS for brevity. Specifically,
multigranularity structure-aware representation learning is designed to learn
neighborhood structure representations at the amino acid, atom, and whole
protein granularity from AlphaFold predicted structures, followed by utilizing
contrastive learning to optimize the structure representations.Additionally,
multi-scale sequence representation learning is used to extract context
sequence information, and motif generated by aligning all context sequences of
PTM sites assists the prediction. Extensive experiments on three datasets show
that PTM-CMGMS outperforms the state-of-the-art methods.
( 2
min )
We propose a new algorithm for the problem of recovering data that adheres to
multiple, heterogeneous low-dimensional structures from linear observations.
Focusing on data matrices that are simultaneously row-sparse and low-rank, we
propose and analyze an iteratively reweighted least squares (IRLS) algorithm
that is able to leverage both structures. In particular, it optimizes a
combination of non-convex surrogates for row-sparsity and rank, a balancing of
which is built into the algorithm. We prove locally quadratic convergence of
the iterates to a simultaneously structured data matrix in a regime of minimal
sample complexity (up to constants and a logarithmic factor), which is known to
be impossible for a combination of convex surrogates. In experiments, we show
that the IRLS method exhibits favorable empirical convergence, identifying
simultaneously row-sparse and low-rank matrices from fewer measurements than
state-of-the-art methods. Code is available at
https://github.com/ckuemmerle/simirls.
( 2
min )
Collective motion is an ubiquitous phenomenon in nature, inspiring engineers,
physicists and mathematicians to develop mathematical models and bio-inspired
designs. Collective motion at small to medium group sizes ($\sim$10-1000
individuals, also called the `mesoscale'), can show nontrivial features due to
stochasticity. Therefore, characterizing both the deterministic and stochastic
aspects of the dynamics is crucial in the study of mesoscale collective
phenomena. Here, we use a physics-inspired, neural-network based approach to
characterize the stochastic group dynamics of interacting individuals, through
a stochastic differential equation (SDE) that governs the collective dynamics
of the group. We apply this technique on both synthetic and real-world
datasets, and identify the deterministic and stochastic aspects of the dynamics
using drift and diffusion fields, enabling us to make novel inferences about
the nature of order in these systems.
( 2
min )
In this work, we introduce ChatQA, a family of conversational question
answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we
propose a two-stage instruction tuning method that can significantly improve
the zero-shot conversational QA results from large language models (LLMs). To
handle retrieval in conversational QA, we fine-tune a dense retriever on a
multi-turn QA dataset, which provides comparable results to using the
state-of-the-art query rewriting model while largely reducing deployment cost.
Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10
conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic
data from OpenAI GPT models.
( 2
min )
Atrial fibrillation (AF) is a common cardiac arrhythmia characterized by
rapid and irregular contractions of the atria. It significantly elevates the
risk of strokes due to slowed blood flow in the atria, especially in the left
atrial appendage, which is prone to blood clot formation. Such clots can
migrate into cerebral arteries, leading to ischemic stroke. To assess whether
AF patients should be prescribed anticoagulants, doctors often use the
CHA2DS2-VASc scoring system. However, anticoagulant use must be approached with
caution as it can impact clotting functions. This study introduces a machine
learning algorithm that predicts whether patients with AF should be recommended
anticoagulant therapy using 12-lead ECG data. In this model, we use STOME to
enhance time-series data and then process it through a Convolutional Neural
Network (CNN). By incorporating a path development layer, the model achieves a
specificity of 30.6% under the condition of an NPV of 1. In contrast, LSTM
algorithms without path development yield a specificity of only 2.7% under the
same NPV condition.
( 2
min )
Magnetic navigation (MagNav) is a rising alternative to the Global
Positioning System (GPS) and has proven useful for aircraft navigation.
Traditional aircraft navigation systems, while effective, face limitations in
precision and reliability in certain environments and against attacks. Airborne
MagNav leverages the Earth's magnetic field to provide accurate positional
information. However, external magnetic fields induced by aircraft electronics
and Earth's large-scale magnetic fields disrupt the weaker signal of interest.
We introduce a physics-informed approach using Tolles-Lawson coefficients for
compensation and Liquid Time-Constant Networks (LTCs) to remove complex, noisy
signals derived from the aircraft's magnetic sources. Using real flight data
with magnetometer measurements and aircraft measurements, we observe up to a
64% reduction in aeromagnetic compensation error (RMSE nT), outperforming
conventional models. This significant improvement underscores the potential of
a physics-informed, machine learning approach for extracting clean, reliable,
and accurate magnetic signals for MagNav positional estimation.
( 2
min )
Recommendation systems are highly interested in technology companies
nowadays. The businesses are constantly growing users and products, causing the
number of users and items to continuously increase over time, to very large
numbers. Traditional recommendation algorithms with complexity dependent on the
number of users and items make them difficult to adapt to the industrial
environment. In this paper, we introduce a new method applying graph neural
networks with a contrastive learning framework in extracting user preferences.
We incorporate a soft clustering architecture that significantly reduces the
computational cost of the inference process. Experiments show that the model is
able to learn user preferences with low computational cost in both training and
prediction phases. At the same time, the model gives a very good accuracy. We
call this architecture EfficientRec with the implication of model compactness
and the ability to scale to unlimited users and products.
( 2
min )
This work introduces a framework to address the computational complexity
inherent in Mixed-Integer Programming (MIP) models by harnessing the potential
of deep learning. We compare the effectiveness of (a) feed-forward neural
networks (ANN) and (b) convolutional neural networks (CNN) in approximating the
active dimensions within MIP problems. We utilize multi-label classification to
account for more than one active dimension. To enhance the framework's
performance, we employ Bayesian optimization for hyperparameter tuning, aiming
to maximize sample-level accuracy. The primary objective is to train the neural
networks to predict all active dimensions accurately, thereby maximizing the
occurrence of global optimum solutions. We apply this framework to a flow-based
facility location allocation Mixed-Integer Linear Programming (MILP)
formulation that describes long-term investment planning and medium-term
tactical planning in a personalized medicine supply chain for cell therapy
manufacturing and distribution.
( 2
min )
Cloud radiative feedback impacts early tropical cyclone (TC) intensification,
but limitations in existing diagnostic frameworks make them unsuitable for
studying asymmetric or transient radiative heating. We propose a linear
Variational Encoder-Decoder (VED) to learn the hidden relationship between
radiation and the surface intensification of realistic simulated TCs. Limiting
VED model inputs enables using its uncertainty to identify periods when
radiation has more importance for intensification. A close examination of the
extracted 3D radiative structures suggests that longwave radiative forcing from
inner core deep convection and shallow clouds both contribute to
intensification, with the deep convection having the most impact overall. We
find that deep convection downwind of the shallow clouds is critical to the
intensification of Haiyan. Our work demonstrates that machine learning can
discover thermodynamic-kinematic relationships without relying on axisymmetric
or deterministic assumptions, paving the way towards the objective discovery of
processes leading to TC intensification in realistic conditions.
( 2
min )
In this paper, we introduce eipy--an open-source Python package for
developing effective, multi-modal heterogeneous ensembles for classification.
eipy simultaneously provides both a rigorous, and user-friendly framework for
comparing and selecting the best-performing multi-modal data integration and
predictive modeling methods by systematically evaluating their performance
using nested cross-validation. The package is designed to leverage
scikit-learn-like estimators as components to build multi-modal predictive
models. An up-to-date user guide, including API reference and tutorials, for
eipy is maintained at https://eipy.readthedocs.io . The main repository for
this project can be found on GitHub at https://github.com/GauravPandeyLab/eipy .
( 2
min )
In this paper we propose a new non-linear classifier based on a combination
of locally linear classifiers. A well known optimization formulation is given
as we cast the problem in a $\ell_1$ Multiple Kernel Learning (MKL) problem
using many locally linear kernels. Since the number of such kernels is huge, we
provide a scalable generic MKL training algorithm handling streaming kernels.
With respect to the inference time, the resulting classifier fits the gap
between high accuracy but slow non-linear classifiers (such as classical MKL)
and fast but low accuracy linear classifiers.
( 2
min )
In the realm of robot action recognition, identifying distinct but spatially
proximate arm movements using vision systems in noisy environments poses a
significant challenge. This paper studies robot arm action recognition in noisy
environments using machine learning techniques. Specifically, a vision system
is used to track the robot's movements followed by a deep learning model to
extract the arm's key points. Through a comparative analysis of machine
learning methods, the effectiveness and robustness of this model are assessed
in noisy environments. A case study was conducted using the Tic-Tac-Toe game in
a 3-by-3 grid environment, where the focus is to accurately identify the
actions of the arms in selecting specific locations within this constrained
environment. Experimental results show that our approach can achieve precise
key point detection and action classification despite the addition of noise and
uncertainties to the dataset.
( 2
min )
The advent of large language models (LLMs) such as ChatGPT has attracted
considerable attention in various domains due to their remarkable performance
and versatility. As the use of these models continues to grow, the importance
of effective prompt engineering has come to the fore. Prompt optimization
emerges as a crucial challenge, as it has a direct impact on model performance
and the extraction of relevant information. Recently, evolutionary algorithms
(EAs) have shown promise in addressing this issue, paving the way for novel
optimization strategies. In this work, we propose a evolutionary
multi-objective (EMO) approach specifically tailored for prompt optimization
called EMO-Prompts, using sentiment analysis as a case study. We use sentiment
analysis capabilities as our experimental targets. Our results demonstrate that
EMO-Prompts effectively generates prompts capable of guiding the LLM to produce
texts embodying two conflicting emotions simultaneously.
( 2
min )
In the field of scientific computing, many problem-solving approaches tend to
focus only on the process and final outcome, even in AI for science, there is a
lack of deep multimodal information mining behind the data, missing a
multimodal framework akin to that in the image-text domain. In this paper, we
take Symbolic Regression(SR) as our focal point and, drawing inspiration from
the BLIP model in the image-text domain, propose a scientific computing
multimodal framework based on Function Images (Funcimg) and Operation Tree
Sequence (OTS), named Bootstrapping OTS-Funcimg Pre-training Model (Botfip). In
SR experiments, we validate the advantages of Botfip in low-complexity SR
problems, showcasing its potential. As a MED framework, Botfip holds promise
for future applications in a broader range of scientific computing problems.
( 2
min )
Hierarchical federated learning (HFL) enables distributed training of models
across multiple devices with the help of several edge servers and a cloud edge
server in a privacy-preserving manner. In this paper, we consider HFL with
highly mobile devices, mainly targeting at vehicular networks. Through
convergence analysis, we show that mobility influences the convergence speed by
both fusing the edge data and shuffling the edge models. While mobility is
usually considered as a challenge from the perspective of communication, we
prove that it increases the convergence speed of HFL with edge-level
heterogeneous data, since more diverse data can be incorporated. Furthermore,
we demonstrate that a higher speed leads to faster convergence, since it
accelerates the fusion of data. Simulation results show that mobility increases
the model accuracy of HFL by up to 15.1% when training a convolutional neural
network on the CIFAR-10 dataset.
( 2
min )
We demonstrate and evaluate a fully-blind digital signal processing (DSP)
chain for 100G passive optical networks (PONs), and analyze different equalizer
topologies based on neural networks with low hardware complexity.
( 2
min )
This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to
YouTube videos facilitating the creation of a novel speaker recognition
dataset. VoxCeleb-ESP captures real-world scenarios, incorporating diverse
speaking styles, noises, and channel distortions. It includes 160 Spanish
celebrities spanning various categories, ensuring a representative distribution
across age groups and geographic regions in Spain. We provide two speaker trial
lists for speaker identification tasks, each of them with same-video or
different-video target trials respectively, accompanied by a cross-lingual
evaluation of ResNet pretrained models. Preliminary speaker identification
results suggest that the complexity of the detection task in VoxCeleb-ESP is
equivalent to that of the original and much larger VoxCeleb in English.
VoxCeleb-ESP contributes to the expansion of speaker recognition benchmarks
with a comprehensive and diverse dataset for the Spanish language.
( 2
min )
The quality of recorded videos and images is significantly influenced by the
camera's field of view (FOV). In critical applications like surveillance
systems and self-driving cars, an inadequate FOV can give rise to severe safety
and security concerns, including car accidents and thefts due to the failure to
detect individuals and objects. The conventional methods for establishing the
correct FOV heavily rely on human judgment and lack automated mechanisms to
assess video and image quality based on FOV. In this paper, we introduce an
innovative approach that harnesses semantic line detection and classification
alongside deep Hough transform to identify semantic lines, thus ensuring a
suitable FOV by understanding 3D view through parallel lines. Our approach
yields an effective F1 score of 0.729 on the public EgoCart dataset, coupled
with a notably high median score in the line placement metric. We illustrate
that our method offers a straightforward means of assessing the quality of the
camera's field of view, achieving a classification accuracy of 83.8\%. This
metric can serve as a proxy for evaluating the potential performance of video
and image quality applications.
( 2
min )
Advancements in machine learning (ML) have significantly revolutionized
medical image analysis, prompting hospitals to rely on external ML services.
However, the exchange of sensitive patient data, such as chest X-rays, poses
inherent privacy risks when shared with third parties. Addressing this concern,
we propose MedBlindTuner, a privacy-preserving framework leveraging fully
homomorphic encryption (FHE) and a data-efficient image transformer (DEiT).
MedBlindTuner enables the training of ML models exclusively on FHE-encrypted
medical images. Our experimental evaluation demonstrates that MedBlindTuner
achieves comparable accuracy to models trained on non-encrypted images,
offering a secure solution for outsourcing ML computations while preserving
patient data privacy. To the best of our knowledge, this is the first work that
uses data-efficient image transformers and fully homomorphic encryption in this
domain.
( 2
min )
Expensive ultrasonic anemometers are usually required to measure wind speed
accurately. The aim of this work is to overcome the loss of accuracy of a low
cost hot-wire anemometer caused by the changes of air temperature, by means of
a probabilistic calibration using Gaussian Process Regression. Gaussian Process
Regression is a non-parametric, Bayesian, and supervised learning method
designed to make predictions of an unknown target variable as a function of one
or more known input variables. Our approach is validated against real datasets,
obtaining a good performance in inferring the actual wind speed values. By
performing, before its real use in the field, a calibration of the hot-wire
anemometer taking into account air temperature, permits that the wind speed can
be estimated for the typical range of ambient temperatures, including a
grounded uncertainty estimation for each speed measure.
( 2
min )
The synthesis of string transformation programs from input-output examples
utilizes various techniques, all based on an inductive bias that comprises a
restricted set of basic operators to be combined. A new algorithm, Transduce,
is proposed, which is founded on the construction of abstract transduction
grammars and their generalization. We experimentally demonstrate that Transduce
can learn positional transformations efficiently from one or two positive
examples without inductive bias, achieving a success rate higher than the
current state of the art.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established upper bound is larger than the lower
bound by a factor that is logarithmic in the width.
( 2
min )
We consider transformer encoders with hard attention (in which all attention
is focused on exactly one position) and strict future masking (in which each
position only attends to positions strictly to its left), and prove that the
class of languages recognized by these networks is exactly the star-free
languages. Adding position embeddings increases the class of recognized
languages to other well-studied classes. A key technique in these proofs is
Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the
star-free languages, we relate transformers to first-order logic, temporal
logic, and algebraic automata theory.
( 2
min )
Denoising diffusions are a powerful method to generate approximate samples
from high-dimensional data distributions. Recent results provide polynomial
bounds on their convergence rate, assuming $L^2$-accurate scores. Until now,
the tightest bounds were either superlinear in the data dimension or required
strong smoothness assumptions. We provide the first convergence bounds which
are linear in the data dimension (up to logarithmic factors) assuming only
finite second moments of the data distribution. We show that diffusion models
require at most $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ steps to
approximate an arbitrary distribution on $\mathbb{R}^d$ corrupted with Gaussian
noise of variance $\delta$ to within $\varepsilon^2$ in KL divergence. Our
proof extends the Girsanov-based methods of previous works. We introduce a
refined treatment of the error from discretizing the reverse SDE inspired by
stochastic localization.
( 2
min )
Developing tools to automatically detect check-worthy claims in political
debates and speeches can greatly help moderators of debates, journalists, and
fact-checkers. While previous work on this problem has focused exclusively on
the text modality, here we explore the utility of the audio modality as an
additional input. We create a new multimodal dataset (text and audio in
English) containing 48 hours of speech from past political debates in the USA.
We then experimentally demonstrate that, in the case of multiple speakers,
adding the audio modality yields sizable improvements over using the text
modality alone; moreover, an audio-only model could outperform a text-only one
for a single speaker. With the aim to enable future research, we make all our
data and code publicly available at
https://github.com/petar-iv/audio-checkworthiness-detection.
( 2
min )
DNNs are widely used but face significant computational costs due to matrix
multiplications, especially from data movement between the memory and
processing units. One promising approach is therefore Processing-in-Memory as
it greatly reduces this overhead. However, most PIM solutions rely either on
novel memory technologies that have yet to mature or bit-serial computations
that have significant performance overhead and scalability issues. Our work
proposes an in-SRAM digital multiplier, that uses a conventional memory to
perform bit-parallel computations, leveraging multiple wordlines activation. We
then introduce DAISM, an architecture leveraging this multiplier, which
achieves up to two orders of magnitude higher area efficiency compared to the
SOTA counterparts, with competitive energy efficiency.
( 2
min )
As a classical generative modeling approach, energy-based models have the
natural advantage of flexibility in the form of the energy function. Recently,
energy-based models have achieved great success in modeling high-dimensional
data in computer vision and natural language processing. In line with these
advancements, we build a multi-purpose energy-based probabilistic model for
High Energy Physics events at the Large Hadron Collider. This framework builds
on a powerful generative model and describes higher-order inter-particle
interactions. It suits different encoding architectures and builds on implicit
generation. As for applicative aspects, it can serve as a powerful
parameterized event generator for physics simulation, a generic anomalous
signal detector free from spurious correlations, and an augmented event
classifier for particle identification.
( 2
min )
We introduce a novel procedure for obtaining cross-validated predictive
estimates for Bayesian hierarchical regression models (BHRMs). Bayesian
hierarchical models are popular for their ability to model complex dependence
structures and provide probabilistic uncertainty estimates, but can be
computationally expensive to run. Cross-validation (CV) is therefore not a
common practice to evaluate the predictive performance of BHRMs. Our method
circumvents the need to re-run computationally costly estimation methods for
each cross-validation fold and makes CV more feasible for large BHRMs. By
conditioning on the variance-covariance parameters, we shift the CV problem
from probability-based sampling to a simple and familiar optimization problem.
In many cases, this produces estimates which are equivalent to full CV. We
provide theoretical results and demonstrate its efficacy on publicly available
data and in simulations.
( 2
min )
Federated learning are inherently hampered by data heterogeneity: non-iid
distributed training data over local clients. We propose a novel model training
approach for federated learning, FLex&Chill, which exploits the Logit Chilling
method. Through extensive evaluations, we demonstrate that, in the presence of
non-iid data characteristics inherent in federated learning systems, this
approach can expedite model convergence and improve inference accuracy.
Quantitatively, from our experiments, we observe up to 6X improvement in the
global federated learning model convergence time, and up to 3.37% improvement
in inference accuracy.
( 2
min )
Graph Neural Networks (GNNs) have become the preferred tool to process graph
data, with their efficacy being boosted through graph data augmentation
techniques. Despite the evolution of augmentation methods, issues like graph
property distortions and restricted structural changes persist. This leads to
the question: Is it possible to develop more property-conserving and
structure-sensitive augmentation methods? Through a spectral lens, we
investigate the interplay between graph properties, their augmentation, and
their spectral behavior, and found that keeping the low-frequency eigenvalues
unchanged can preserve the critical properties at a large scale when generating
augmented graphs. These observations inform our introduction of the Dual-Prism
(DP) augmentation method, comprising DP-Noise and DP-Mask, which adeptly
retains essential graph properties while diversifying augmented graphs.
Extensive experiments validate the efficiency of our approach, providing a new
and promising direction for graph data augmentation.
( 2
min )
Graph Neural Networks (GNNs) have shown considerable effectiveness in a
variety of graph learning tasks, particularly those based on the
message-passing approach in recent years. However, their performance is often
constrained by a limited receptive field, a challenge that becomes more acute
in the presence of sparse graphs. In light of the power series, which possesses
infinite expansion capabilities, we propose a novel \underline{G}raph
\underline{P}ower \underline{F}ilter \underline{N}eural Network (GPFN) that
enhances node classification by employing a power series graph filter to
augment the receptive field. Concretely, our GPFN designs a new way to build a
graph filter with an infinite receptive field based on the convergence power
series, which can be analyzed in the spectral and spatial domains. Besides, we
theoretically prove that our GPFN is a general framework that can integrate any
power series and capture long-range dependencies. Finally, experimental results
on three datasets demonstrate the superiority of our GPFN over state-of-the-art
baselines.
( 2
min )
Digital-analog quantum computing (DAQC) is an alternative paradigm for
universal quantum computation combining digital single-qubit gates with global
analog operations acting on a register of interacting qubits. Currently, no
available open-source software is tailored to express, differentiate, and
execute programs within the DAQC paradigm. In this work, we address this
shortfall by presenting Qadence, a high-level programming interface for
building complex digital-analog quantum programs developed at Pasqal. Thanks to
its flexible interface, native differentiability, and focus on real-device
execution, Qadence aims at advancing research on variational quantum algorithms
built for native DAQC platforms such as Rydberg atom arrays.
( 2
min )
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. The classic WDRO estimator is asymptotically
biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting
in a smaller asymptotic mean squared error. Meanwhile, the proposed adjusted
WDRO has an out-of-sample performance guarantee. Further, under certain
conditions, our proposed adjustment technique provides a general principle to
de-bias asymptotically biased estimators. Specifically, we will investigate how
the adjusted WDRO estimator is developed in the generalized linear model,
including logistic regression, linear regression, and Poisson regression.
Numerical experiments demonstrate the favorable practical performance of the
adjusted estimator over the classic one.
( 2
min )
We introduce a novel procedure for obtaining cross-validated predictive
estimates for Bayesian hierarchical regression models (BHRMs). Bayesian
hierarchical models are popular for their ability to model complex dependence
structures and provide probabilistic uncertainty estimates, but can be
computationally expensive to run. Cross-validation (CV) is therefore not a
common practice to evaluate the predictive performance of BHRMs. Our method
circumvents the need to re-run computationally costly estimation methods for
each cross-validation fold and makes CV more feasible for large BHRMs. By
conditioning on the variance-covariance parameters, we shift the CV problem
from probability-based sampling to a simple and familiar optimization problem.
In many cases, this produces estimates which are equivalent to full CV. We
provide theoretical results and demonstrate its efficacy on publicly available
data and in simulations.
( 2
min )
As a classical generative modeling approach, energy-based models have the
natural advantage of flexibility in the form of the energy function. Recently,
energy-based models have achieved great success in modeling high-dimensional
data in computer vision and natural language processing. In line with these
advancements, we build a multi-purpose energy-based probabilistic model for
High Energy Physics events at the Large Hadron Collider. This framework builds
on a powerful generative model and describes higher-order inter-particle
interactions. It suits different encoding architectures and builds on implicit
generation. As for applicative aspects, it can serve as a powerful
parameterized event generator for physics simulation, a generic anomalous
signal detector free from spurious correlations, and an augmented event
classifier for particle identification.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established upper bound is larger than the lower
bound by a factor that is logarithmic in the width.
( 2
min )
Denoising diffusions are a powerful method to generate approximate samples
from high-dimensional data distributions. Recent results provide polynomial
bounds on their convergence rate, assuming $L^2$-accurate scores. Until now,
the tightest bounds were either superlinear in the data dimension or required
strong smoothness assumptions. We provide the first convergence bounds which
are linear in the data dimension (up to logarithmic factors) assuming only
finite second moments of the data distribution. We show that diffusion models
require at most $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ steps to
approximate an arbitrary distribution on $\mathbb{R}^d$ corrupted with Gaussian
noise of variance $\delta$ to within $\varepsilon^2$ in KL divergence. Our
proof extends the Girsanov-based methods of previous works. We introduce a
refined treatment of the error from discretizing the reverse SDE inspired by
stochastic localization.
( 2
min )
In this paper we propose a new non-linear classifier based on a combination
of locally linear classifiers. A well known optimization formulation is given
as we cast the problem in a $\ell_1$ Multiple Kernel Learning (MKL) problem
using many locally linear kernels. Since the number of such kernels is huge, we
provide a scalable generic MKL training algorithm handling streaming kernels.
With respect to the inference time, the resulting classifier fits the gap
between high accuracy but slow non-linear classifiers (such as classical MKL)
and fast but low accuracy linear classifiers.
( 2
min )
Expensive ultrasonic anemometers are usually required to measure wind speed
accurately. The aim of this work is to overcome the loss of accuracy of a low
cost hot-wire anemometer caused by the changes of air temperature, by means of
a probabilistic calibration using Gaussian Process Regression. Gaussian Process
Regression is a non-parametric, Bayesian, and supervised learning method
designed to make predictions of an unknown target variable as a function of one
or more known input variables. Our approach is validated against real datasets,
obtaining a good performance in inferring the actual wind speed values. By
performing, before its real use in the field, a calibration of the hot-wire
anemometer taking into account air temperature, permits that the wind speed can
be estimated for the typical range of ambient temperatures, including a
grounded uncertainty estimation for each speed measure.
( 2
min )
In this paper, we discuss a potential agenda for future work in the theory of
random sets and belief functions, touching upon a number of focal issues: the
development of a fully-fledged theory of statistical reasoning with random
sets, including the generalisation of logistic regression and of the classical
laws of probability; the further development of the geometric approach to
uncertainty, to include general random sets, a wider range of uncertainty
measures and alternative geometric representations; the application of this new
theory to high-impact areas such as climate change, machine learning and
statistical learning theory.
( 2
min )
In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. Pre-trained language models (PLMs) are undergoing rapid commercial and enterprise adoption in the areas of productivity tools, customer service, search and recommendations, business process automation, and […]
( 15
min )
Despite the success of deep learning-based algorithms, it is widely known
that neural networks may fail to be robust. A popular paradigm to enforce
robustness is adversarial training (AT), however, this introduces many
computational and theoretical difficulties. Recent works have developed a
connection between AT in the multiclass classification setting and
multimarginal optimal transport (MOT), unlocking a new set of tools to study
this problem. In this paper, we leverage the MOT connection to propose
computationally tractable numerical algorithms for computing universal lower
bounds on the optimal adversarial risk and identifying optimal classifiers. We
propose two main algorithms based on linear programming (LP) and entropic
regularization (Sinkhorn). Our key insight is that one can harmlessly truncate
the higher order interactions between classes, preventing the combinatorial run
times typically encountered in MOT problems. We validate these results with
experiments on MNIST and CIFAR-$10$, which demonstrate the tractability of our
approach.
( 2
min )
Online reviews in the form of user-generated content (UGC) significantly
impact consumer decision-making. However, the pervasive issue of not only human
fake content but also machine-generated content challenges UGC's reliability.
Recent advances in Large Language Models (LLMs) may pave the way to fabricate
indistinguishable fake generated content at a much lower cost. Leveraging
OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a
multi-modal dataset of 20,144 restaurant review-image pairs divided into
authentic and machine-generated. We explore unimodal and multimodal detection
models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from
readability and photographic theories to score reviews and images,
respectively, demonstrating their utility as hand-crafted features in scalable
and interpretable detection models, with comparable performance. The paper
contributes by open-sourcing the dataset and releasing fake review detectors,
recommending its use in unimodal and multimodal fake review detection tasks,
and evaluating linguistic and visual features in synthetic versus authentic
data.
( 2
min )
This paper introduces the Expected Booking (xB) model, a novel metric
designed to estimate the likelihood of a foul resulting in a yellow card in
football. Through three iterative experiments, employing ensemble methods, the
model demonstrates improved performance with additional features and an
expanded dataset. Analysis of FIFA World Cup 2022 data validates the model's
efficacy in providing insights into team and player fouling tactics, aligning
with actual defensive performance. The xB model addresses a gap in fouling
efficiency examination, emphasizing defensive strategies which often
overlooked. Further enhancements are suggested through the incorporation of
comprehensive data and spatial features.
( 2
min )
This paper discusses the limitations of machine learning (ML), particularly
deep artificial neural networks (ANNs), which are effective at approximating
complex functions but often lack transparency and explanatory power. It
highlights the `problem of induction' : the philosophical issue that past
observations may not necessarily predict future events, a challenge that ML
models face when encountering new, unseen data. The paper argues for the
importance of not just making predictions but also providing good explanations,
a feature that current models often fail to deliver. It suggests that for AI to
progress, we must seek models that offer insights and explanations, not just
predictions.
( 2
min )
This paper proposes two methods for causal additive models with unobserved
variables (CAM-UV). CAM-UV assumes that the causal functions take the form of
generalized additive models and that latent confounders are present. First, we
propose a method that leverages prior knowledge for efficient causal discovery.
Then, we propose an extension of this method for inferring causality in time
series data. The original CAM-UV algorithm differs from other existing causal
function models in that it does not seek the causal order between observed
variables, but rather aims to identify the causes for each observed variable.
Therefore, the first proposed method in this paper utilizes prior knowledge,
such as understanding that certain variables cannot be causes of specific
others. Moreover, by incorporating the prior knowledge that causes precedes
their effects in time, we extend the first algorithm to the second method for
causal discovery in time series data. We validate the first proposed method by
using simulated data to demonstrate that the accuracy of causal discovery
increases as more prior knowledge is accumulated. Additionally, we test the
second proposed method by comparing it with existing time series causal
discovery methods, using both simulated data and real-world data.
( 3
min )
Adversarial Attacks on Face Recognition (FR) encompass two types:
impersonation attacks and evasion attacks. We observe that achieving a
successful impersonation attack on FR does not necessarily ensure a successful
dodging attack on FR in the black-box setting. Introducing a novel attack
method named Pre-training Pruning Restoration Attack (PPR), we aim to enhance
the performance of dodging attacks whilst avoiding the degradation of
impersonation attacks. Our method employs adversarial example pruning, enabling
a portion of adversarial perturbations to be set to zero, while tending to
maintain the attack performance. By utilizing adversarial example pruning, we
can prune the pre-trained adversarial examples and selectively free up certain
adversarial perturbations. Thereafter, we embed adversarial perturbations in
the pruned area, which enhances the dodging performance of the adversarial face
examples. The effectiveness of our proposed attack method is demonstrated
through our experimental results, showcasing its superior performance.
( 2
min )
With growing concerns surrounding privacy and regulatory compliance, the
concept of machine unlearning has gained prominence, aiming to selectively
forget or erase specific learned information from a trained model. In response
to this critical need, we introduce a novel approach called Attack-and-Reset
for Unlearning (ARU). This algorithm leverages meticulously crafted adversarial
noise to generate a parameter mask, effectively resetting certain parameters
and rendering them unlearnable. ARU outperforms current state-of-the-art
results on two facial machine-unlearning benchmark datasets, MUFAC and MUCAC.
In particular, we present the steps involved in attacking and masking that
strategically filter and re-initialize network parameters biased towards the
forget set. Our work represents a significant advancement in rendering data
unexploitable to deep learning models through parameter re-initialization,
achieved by harnessing adversarial noise to craft a mask.
( 2
min )
The goal of real-time lyrics alignment is to take live singing audio as input
and to pinpoint the exact position within given lyrics on the fly. The task can
benefit real-world applications such as the automatic subtitling of live
concerts or operas. However, designing a real-time model poses a great
challenge due to the constraints of only using past input and operating within
a minimal latency. Furthermore, due to the lack of datasets for real-time
models for lyrics alignment, previous studies have mostly evaluated with
private in-house datasets, resulting in a lack of standard evaluation methods.
This paper presents a real-time lyrics alignment system for classical vocal
performances with two contributions. First, we improve the lyrics alignment
algorithm by finding an optimal combination of chromagram and phonetic
posteriorgram (PPG) that capture melodic and phonetics features of the singing
voice, respectively. Second, we recast the Schubert Winterreise Dataset (SWD)
which contains multiple performance renditions of the same pieces as an
evaluation set for the real-time lyrics alignment.
( 2
min )
Graph neural networks are increasingly becoming the framework of choice for
graph-based machine learning. In this paper, we propose a new graph neural
network architecture that substitutes classical message passing with an
analysis of the local distribution of node features. To this end, we extract
the distribution of features in the egonet for each local neighbourhood and
compare them against a set of learned label distributions by taking the
histogram intersection kernel. The similarity information is then propagated to
other nodes in the network, effectively creating a message passing-like
mechanism where the message is determined by the ensemble of the features. We
perform an ablation study to evaluate the network's performance under different
choices of its hyper-parameters. Finally, we test our model on standard graph
classification and regression benchmarks, and we find that it outperforms
widely used alternative approaches, including both graph kernels and graph
neural networks.
( 2
min )
We introduce a novel capacity measure 2sED for statistical models based on
the effective dimension. The new quantity provably bounds the generalization
error under mild assumptions on the model. Furthermore, simulations on standard
data sets and popular model architectures show that 2sED correlates well with
the training error. For Markovian models, we show how to efficiently
approximate 2sED from below through a layerwise iterative approach, which
allows us to tackle deep learning models with a large number of parameters.
Simulation results suggest that the approximation is good for different
prominent models and data sets.
( 2
min )
Due to the complex behavior arising from non-uniqueness, symmetry, and
bifurcations in the solution space, solving inverse problems of nonlinear
differential equations (DEs) with multiple solutions is a challenging task. To
address this, we propose homotopy physics-informed neural networks (HomPINNs),
a novel framework that leverages homotopy continuation and neural networks
(NNs) to solve inverse problems. The proposed framework begins with the use of
NNs to simultaneously approximate unlabeled observations across diverse
solutions while adhering to DE constraints. Through homotopy continuation, the
proposed method solves the inverse problem by tracing the observations and
identifying multiple solutions. The experiments involve testing the performance
of the proposed method on one-dimensional DEs and applying it to solve a
two-dimensional Gray-Scott simulation. Our findings demonstrate that the
proposed method is scalable and adaptable, providing an effective solution for
solving DEs with multiple solutions and unknown parameters. Moreover, it has
significant potential for various applications in scientific computing, such as
modeling complex systems and solving inverse problems in physics, chemistry,
biology, etc.
( 3
min )
Semantic similarity between natural language texts is typically measured
either by looking at the overlap between subsequences (e.g., BLEU) or by using
embeddings (e.g., BERTScore, S-BERT). Within this paper, we argue that when we
are only interested in measuring the semantic similarity, it is better to
directly predict the similarity using a fine-tuned model for such a task. Using
a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B)
from the GLUE benchmark, we define the STSScore approach and show that the
resulting similarity is better aligned with our expectations on a robust
semantic similarity measure than other approaches.
( 2
min )
Anomaly, or out-of-distribution, detection is a promising tool for aiding
discoveries of new particles or processes in particle physics. In this work, we
identify and address two overlooked opportunities to improve anomaly detection
for high-energy physics. First, rather than train a generative model on the
single most dominant background process, we build detection algorithms using
representation learning from multiple background types, thus taking advantage
of more information to improve estimation of what is relevant for detection.
Second, we generalize decorrelation to the multi-background setting, thus
directly enforcing a more complete definition of robustness for anomaly
detection. We demonstrate the benefit of the proposed robust multi-background
anomaly detection algorithms on a high-dimensional dataset of particle decays
at the Large Hadron Collider.
( 2
min )
Unsupervised Multiple Domain Translation is the task of transforming data
from one domain to other domains without having paired data to train the
systems. Typically, methods based on Generative Adversarial Networks (GANs) are
used to address this task. However, our proposal exclusively relies on a
modified version of a Variational Autoencoder. This modification consists of
the use of two latent variables disentangled in a controlled way by design. One
of this latent variables is imposed to depend exclusively on the domain, while
the other one must depend on the rest of the variability factors of the data.
Additionally, the conditions imposed over the domain latent variable allow for
better control and understanding of the latent space. We empirically
demonstrate that our approach works on different vision datasets improving the
performance of other well known methods. Finally, we prove that, indeed, one of
the latent variables stores all the information related to the domain and the
other one hardly contains any domain information.
( 2
min )
Audio embeddings are crucial tools in understanding large catalogs of music.
Typically embeddings are evaluated on the basis of the performance they provide
in a wide range of downstream tasks, however few studies have investigated the
local properties of the embedding spaces themselves which are important in
nearest neighbor algorithms, commonly used in music search and recommendation.
In this work we show that when learning audio representations on music datasets
via contrastive learning, musical properties that are typically homogeneous
within a track (e.g., key and tempo) are reflected in the locality of
neighborhoods in the resulting embedding space. By applying appropriate data
augmentation strategies, localisation of such properties can not only be
reduced but the localisation of other attributes is increased. For example,
locality of features such as pitch and tempo that are less relevant to
non-expert listeners, may be mitigated while improving the locality of more
salient features such as genre and mood, achieving state-of-the-art performance
in nearest neighbor retrieval accuracy. Similarly, we show that the optimal
selection of data augmentation strategies for contrastive learning of music
audio embeddings is dependent on the downstream task, highlighting this as an
important embedding design decision.
( 3
min )
End-to-end learning has emerged as a major paradigm for developing autonomous
systems. Unfortunately, with its performance and convenience comes an even
greater challenge of safety assurance. A key factor of this challenge is the
absence of the notion of a low-dimensional and interpretable dynamical state,
around which traditional assurance methods revolve. Focusing on the online
safety prediction problem, this paper proposes a configurable family of
learning pipelines based on generative world models, which do not require
low-dimensional states. To implement these pipelines, we overcome the
challenges of learning safety-informed latent representations and missing
safety labels under prediction-induced distribution shift. These pipelines come
with statistical calibration guarantees on their safety chance predictions
based on conformal prediction. We perform an extensive evaluation of the
proposed learning pipelines on two case studies of image-controlled systems: a
racing car and a cartpole.
( 2
min )
Vertical Federated Learning (VFL) is a crucial paradigm for training machine
learning models on feature-partitioned, distributed data. However, due to
privacy restrictions, few public real-world VFL datasets exist for algorithm
evaluation, and these represent a limited array of feature distributions.
Existing benchmarks often resort to synthetic datasets, derived from arbitrary
feature splits from a global set, which only capture a subset of feature
distributions, leading to inadequate algorithm performance assessment. This
paper addresses these shortcomings by introducing two key factors affecting VFL
performance - feature importance and feature correlation - and proposing
associated evaluation metrics and dataset splitting methods. Additionally, we
introduce a real VFL dataset to address the deficit in image-image VFL
scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides
valuable insights for future research in the field.
( 2
min )
We study the classical Network Revenue Management (NRM) problem with
accept/reject decisions and $T$ IID arrivals. We consider a distributional form
where each arrival must fall under a finite number of possible categories, each
with a deterministic resource consumption vector, but a random value
distributed continuously over an interval. We develop an online algorithm that
achieves $O(\log^2 T)$ regret under this model, with the only (necessary)
assumption being that the probability densities are bounded away from 0. We
derive a second result that achieves $O(\log T)$ regret under an additional
assumption of second-order growth. To our knowledge, these are the first
results achieving logarithmic-level regret in an NRM model with continuous
values that do not require any kind of ``non-degeneracy'' assumptions. Our
results are achieved via new techniques including a new method of bounding
myopic regret, a ``semi-fluid'' relaxation of the offline allocation, and an
improved bound on the ``dual convergence''.
( 2
min )
Deep learning techniques, despite their potential, often suffer from a lack
of reproducibility and generalizability, impeding their clinical adoption.
Image segmentation is one of the critical tasks in medical image analysis, in
which one or several regions/volumes of interest should be annotated. This
paper introduces the RIDGE checklist, a framework for assessing the
Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of
deep learning-based medical image segmentation models. The checklist serves as
a guide for researchers to enhance the quality and transparency of their work,
ensuring that segmentation models are not only scientifically sound but also
clinically relevant.
( 2
min )
Continual learning, the ability of a model to learn over time without
forgetting previous knowledge and, therefore, be adaptive to new data, is
paramount in dynamic fields such as disease outbreak prediction. Deep neural
networks, i.e., LSTM, are prone to error due to catastrophic forgetting. This
study introduces a novel CEL model for continual learning by leveraging domain
adaptation via Elastic Weight Consolidation (EWC). This model aims to mitigate
the catastrophic forgetting phenomenon in a domain incremental setting. The
Fisher Information Matrix (FIM) is constructed with EWC to develop a
regularization term that penalizes changes to important parameters, namely, the
important previous knowledge. CEL's performance is evaluated on three distinct
diseases, Influenza, Mpox, and Measles, with different metrics. The high
R-squared values during evaluation and reevaluation outperform the other
state-of-the-art models in several contexts, indicating that CEL adapts to
incremental data well. CEL's robustness and reliability are underscored by its
minimal 65% forgetting rate and 18% higher memory stability compared to
existing benchmark studies. This study highlights CEL's versatility in disease
outbreak prediction, addressing evolving data with temporal patterns. It offers
a valuable model for proactive disease control with accurate, timely
predictions.
( 2
min )
We present Scalable Interpolant Transformers (SiT), a family of generative
models built on the backbone of Diffusion Transformers (DiT). The interpolant
framework, which allows for connecting two distributions in a more flexible way
than standard diffusion models, makes possible a modular study of various
design choices impacting generative models built on dynamical transport: using
discrete vs. continuous time learning, deciding the objective for the model to
learn, choosing the interpolant connecting the distributions, and deploying a
deterministic or stochastic sampler. By carefully introducing the above
ingredients, SiT surpasses DiT uniformly across model sizes on the conditional
ImageNet 256x256 benchmark using the exact same backbone, number of parameters,
and GFLOPs. By exploring various diffusion coefficients, which can be tuned
separately from learning, SiT achieves an FID-50K score of 2.06.
( 2
min )
Using neural networks for localization of key fob within and surrounding a
car as a security feature for keyless entry is fast emerging. In this paper we
study: 1) the performance of pre-computed features of neural networks based UWB
(ultra wide band) localization classification forming the baseline of our
experiments. 2) Investigate the inherent robustness of various neural networks;
therefore, we include the study of robustness of the adversarial examples
without any adversarial training in this work. 3) Propose a multi-head
self-supervised neural network architecture which outperforms the baseline
neural networks without any adversarial training. The model's performance
improved by 67% at certain ranges of adversarial magnitude for fast gradient
sign method and 37% each for basic iterative method and projected gradient
descent method.
( 2
min )
We present a novel method for anomaly detection in Solar System object data,
in preparation for the Legacy Survey of Space and Time. We train a deep
autoencoder for anomaly detection and use the learned latent space to search
for other interesting objects. We demonstrate the efficacy of the autoencoder
approach by finding interesting examples, such as interstellar objects, and
show that using the autoencoder, further examples of interesting classes can be
found. We also investigate the limits of classic unsupervised approaches to
anomaly detection through the generation of synthetic anomalies and evaluate
the feasibility of using a supervised learning approach. Future work should
consider expanding the feature space to increase the variety of anomalies that
can be uncovered during the survey using an autoencoder.
( 2
min )
In this paper, for the first time, a method is presented that can provide a
fully automated surgery based on software and computer vision techniques. Then,
the advantages and challenges of computerization of medical surgery are
examined. Finally, the surgery related to isolated ovarian endometriosis
disease has been examined, and based on the presented method, a more detailed
algorithm is presented that is capable of automatically diagnosing and treating
this disease during surgery as proof of our proposed method where a U-net is
trained to detect the endometriosis during surgery.
( 2
min )
In this paper, we introduce DiarizationLM, a framework to leverage large
language models (LLM) to post-process the outputs from a speaker diarization
system. Various goals can be achieved with the proposed framework, such as
improving the readability of the diarized transcript, or reducing the word
diarization error rate (WDER). In this framework, the outputs of the automatic
speech recognition (ASR) and speaker diarization systems are represented as a
compact textual format, which is included in the prompt to an optionally
finetuned LLM. The outputs of the LLM can be used as the refined diarization
results with the desired enhancement. As a post-processing step, this framework
can be easily applied to any off-the-shelf ASR and speaker diarization systems
without retraining existing components. Our experiments show that a finetuned
PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone
conversation dataset, and rel. 44.9% on the Callhome English dataset.
( 2
min )
Inspired by human conscious planning, we propose Skipper, a model-based
reinforcement learning agent utilizing spatio-temporal abstractions to
generalize learned skills in novel situations. It automatically decomposes the
given task into smaller, more manageable subtasks, and hence enables sparse
decision-making and focused computation on the relevant parts of the
environment. This relies on the extraction of an abstracted proxy problem
represented as a directed graph, in which vertices and edges are learned
end-to-end from hindsight. Our theoretical analyses provide performance
guarantees under appropriate assumptions and establish where our approach is
expected to be helpful. Generalization-focused experiments validate Skipper's
significant advantage in zero-shot generalization, compared to existing
state-of-the-art hierarchical planning methods.
( 2
min )
Mechanistic interpretability seeks to understand the internal mechanisms of
machine learning models, where localization -- identifying the important model
components -- is a key step. Activation patching, also known as causal tracing
or interchange intervention, is a standard technique for this task (Vig et al.,
2020), but the literature contains many variants with little consensus on the
choice of hyperparameters or methodology. In this work, we systematically
examine the impact of methodological details in activation patching, including
evaluation metrics and corruption methods. In several settings of localization
and circuit discovery in language models, we find that varying these
hyperparameters could lead to disparate interpretability results. Backed by
empirical observations, we give conceptual arguments for why certain metrics or
methods may be preferred. Finally, we provide recommendations for the best
practices of activation patching going forwards.
( 2
min )
We present a new representation learning framework, Intensity Profile
Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$,
each representing a time-stamped ($t$) interaction between two entities
($i,j$), our procedure returns a continuous-time trajectory for each node,
representing its behaviour over time. The framework consists of three stages:
estimating pairwise intensity functions, e.g. via kernel smoothing; learning a
projection which minimises a notion of intensity reconstruction error; and
constructing evolving node representations via the learned projection. The
trajectories satisfy two properties, known as structural and temporal
coherence, which we see as fundamental for reliable inference. Moreoever, we
develop estimation theory providing tight control on the error of any estimated
trajectory, indicating that the representations could even be used in quite
noise-sensitive follow-on analyses. The theory also elucidates the role of
smoothing as a bias-variance trade-off, and shows how we can reduce the level
of smoothing as the signal-to-noise ratio increases on account of the algorithm
`borrowing strength' across the network.
( 2
min )
Homeostasis is a biological process by which living beings maintain their
internal balance. Previous research suggests that homeostasis is a learned
behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning
(HRRL) framework attempts to explain this learned homeostatic behavior by
linking Drive Reduction Theory and Reinforcement Learning. This linkage has
been proven in the discrete time-space, but not in the continuous time-space.
In this work, we advance the HRRL framework to a continuous time-space
environment and validate the CTCS-HRRL (Continuous Time Continuous Space HRRL)
framework. We achieve this by designing a model that mimics the homeostatic
mechanisms in a real-world biological agent. This model uses the
Hamilton-Jacobian Bellman Equation, and function approximation based on neural
networks and Reinforcement Learning. Through a simulation-based experiment we
demonstrate the efficacy of this model and uncover the evidence linked to the
agent's ability to dynamically choose policies that favor homeostasis in a
continuously changing internal-state milieu. Results of our experiments
demonstrate that agent learns homeostatic behaviour in a CTCS environment,
making CTCS-HRRL a promising framework for modellng animal dynamics and
decision-making.
( 2
min )
This paper studies the effect of adding geometrically smoothed momentum to
the randomized Kaczmarz algorithm, which is an instance of stochastic gradient
descent on a linear least squares loss function. We prove a result about the
expected error in the direction of singular vectors of the matrix defining the
least squares loss. We present several numerical examples illustrating the
utility of our result and pose several questions.
( 2
min )
We study a streamable attention-based encoder-decoder model in which either
the decoder, or both the encoder and decoder, operate on pre-defined,
fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances
from one chunk to the next chunk, effectively replacing the conventional
end-of-sequence symbol. This modification, while minor, situates our model as
equivalent to a transducer model that operates on chunks instead of frames,
where EOC corresponds to the blank symbol. We further explore the remaining
differences between a standard transducer and our model. Additionally, we
examine relevant aspects such as long-form speech generalization, beam size,
and length normalization. Through experiments on Librispeech and TED-LIUM-v2,
and by concatenating consecutive sequences for long-form trials, we find that
our streamable model maintains competitive performance compared to the
non-streamable variant and generalizes very well to long-form speech.
( 2
min )
We introduce a novel capacity measure 2sED for statistical models based on
the effective dimension. The new quantity provably bounds the generalization
error under mild assumptions on the model. Furthermore, simulations on standard
data sets and popular model architectures show that 2sED correlates well with
the training error. For Markovian models, we show how to efficiently
approximate 2sED from below through a layerwise iterative approach, which
allows us to tackle deep learning models with a large number of parameters.
Simulation results suggest that the approximation is good for different
prominent models and data sets.
( 2
min )
Despite the success of deep learning-based algorithms, it is widely known
that neural networks may fail to be robust. A popular paradigm to enforce
robustness is adversarial training (AT), however, this introduces many
computational and theoretical difficulties. Recent works have developed a
connection between AT in the multiclass classification setting and
multimarginal optimal transport (MOT), unlocking a new set of tools to study
this problem. In this paper, we leverage the MOT connection to propose
computationally tractable numerical algorithms for computing universal lower
bounds on the optimal adversarial risk and identifying optimal classifiers. We
propose two main algorithms based on linear programming (LP) and entropic
regularization (Sinkhorn). Our key insight is that one can harmlessly truncate
the higher order interactions between classes, preventing the combinatorial run
times typically encountered in MOT problems. We validate these results with
experiments on MNIST and CIFAR-$10$, which demonstrate the tractability of our
approach.
( 2
min )
Motivated by the entropic optimal transport problem in unbounded settings, we
study versions of Hilbert's projective metric for spaces of integrable
functions of bounded growth. These versions of Hilbert's metric originate from
cones which are relaxations of the cone of all non-negative functions, in the
sense that they include all functions having non-negative integral values when
multiplied with certain test functions. We show that kernel integral operators
are contractions with respect to suitable specifications of such metrics even
for kernels which are not bounded away from zero, provided that the decay to
zero of the kernel is controlled. As an application to entropic optimal
transport, we show exponential convergence of Sinkhorn's algorithm in settings
where the marginal distributions have sufficiently light tails compared to the
growth of the cost function.
( 2
min )
In the realm of machine learning and statistical modeling, practitioners
often work under the assumption of accessible, static, labeled data for
evaluation and training. However, this assumption often deviates from reality
where data may be private, encrypted, difficult- to-measure, or unlabeled. In
this paper, we bridge this gap by adapting the Hui-Walter paradigm, a method
traditionally applied in epidemiology and medicine, to the field of machine
learning. This approach enables us to estimate key performance metrics such as
false positive rate, false negative rate, and priors in scenarios where no
ground truth is available. We further extend this paradigm for handling online
data, opening up new possibilities for dynamic data environments. Our
methodology involves partitioning data into latent classes to simulate multiple
data populations (if natural populations are unavailable) and independently
training models to replicate multiple tests. By cross-tabulating binary
outcomes across ensemble categorizers and multiple populations, we are able to
estimate unknown parameters through Gibbs sampling, eliminating the need for
ground-truth or labeled data. This paper showcases the potential of our
methodology to transform machine learning practices by allowing for accurate
model assessment under dynamic and uncertain data conditions.
( 2
min )
Two-timescale stochastic approximation (TTSA) is among the most general
frameworks for iterative stochastic algorithms. This includes well-known
stochastic optimization methods such as SGD variants and those designed for
bilevel or minimax problems, as well as reinforcement learning like the family
of gradient-based temporal difference (GTD) algorithms. In this paper, we
conduct an in-depth asymptotic analysis of TTSA under controlled Markovian
noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA
influenced by the underlying Markov chain, which has not been addressed by
previous CLT results of TTSA only with Martingale difference noise. Building
upon our CLT, we expand its application horizon of efficient sampling
strategies from vanilla SGD to a wider TTSA context in distributed learning,
thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT
result to deduce the statistical properties of GTD algorithms with nonlinear
function approximation using Markovian samples and show their identical
asymptotic performance, a perspective not evident from current finite-time
bounds.
( 2
min )
We present a new representation learning framework, Intensity Profile
Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$,
each representing a time-stamped ($t$) interaction between two entities
($i,j$), our procedure returns a continuous-time trajectory for each node,
representing its behaviour over time. The framework consists of three stages:
estimating pairwise intensity functions, e.g. via kernel smoothing; learning a
projection which minimises a notion of intensity reconstruction error; and
constructing evolving node representations via the learned projection. The
trajectories satisfy two properties, known as structural and temporal
coherence, which we see as fundamental for reliable inference. Moreoever, we
develop estimation theory providing tight control on the error of any estimated
trajectory, indicating that the representations could even be used in quite
noise-sensitive follow-on analyses. The theory also elucidates the role of
smoothing as a bias-variance trade-off, and shows how we can reduce the level
of smoothing as the signal-to-noise ratio increases on account of the algorithm
`borrowing strength' across the network.
( 2
min )
MIT CSAIL researchers develop advanced machine-learning models that outperform current methods in detecting pancreatic ductal adenocarcinoma.
( 9
min )
PhD students interning with the MIT-IBM Watson AI Lab look to improve natural language usage.
( 10
min )
Mark Swinnerton aims to fight climate change by transforming abandoned mines into storage tanks of renewable energy. The CEO of startup Green Gravity is prototyping his ambitious vision in a warehouse 60 miles south of Sydney, Australia, and simulating it in NVIDIA Omniverse, a platform for building 3D workflows and applications. The concept requires some Read article >
( 6
min )
Hold on to your seats — this GFN Thursday is unleashing dinosaurs, crowns and more in the cloud. Catch it all on Capcom’s Exoprimal and Ubisoft’s Prince of Persia: The Lost Crown, leading 10 new games joining the GeForce NOW library this week. Suit Up, Adapt, Survive Don cutting-edge exosuit technology and battle ferocious dinosaurs Read article >
( 6
min )
In this work, we study the deep signature algorithms for path-dependent
options. We extend the backward scheme in [Hur\'e-Pham-Warin. Mathematics of
Computation 89, no. 324 (2020)] for state-dependent FBSDEs with reflections to
path-dependent FBSDEs with reflections, by adding the signature layer to the
backward scheme. Our algorithm applies to both European and American type
option pricing problems while the payoff function depends on the whole paths of
the underlying forward stock process. We prove the convergence analysis of our
numerical algorithm with explicit dependence on the truncation order of the
signature and the neural network approximation errors. Numerical examples for
the algorithm are provided including: Amerasian option under the Black-Scholes
model, American option with a path-dependent geometric mean payoff function,
and the Shiryaev's optimal stopping problem.
( 2
min )
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly
combines distinct conditional models together to provide enhanced control over
samples. However, this approach overlooks nonlinear effects that become
significant when guidance scale is large. To address this issue, we propose
characteristic guidance, a sampling method that provides first-principle
non-linear correction for classifier-free guided DDPMs. Such correction forces
the guided DDPMs to respect the Fokker-Planck equation of their underlying
diffusion process, in a way that is training-free, derivative-free, and
compatible with existing sampling methods. Experiments show that characteristic
guidance enhances control and reduces color and exposure issues in image
generation, proving effective in diverse applications ranging from latent space
sampling to solving physics problems like magnet phase transitions.
( 2
min )
Because of its privacy-preserving capability, federated learning (FL) has
attracted significant attention from both academia and industry. However, when
being implemented over wireless networks, it is not clear how much
communication error can be tolerated by FL. This paper investigates the
robustness of FL to the uplink and downlink communication error. Our
theoretical analysis reveals that the robustness depends on two critical
parameters, namely the number of clients and the numerical range of model
parameters. It is also shown that the uplink communication in FL can tolerate a
higher bit error rate (BER) than downlink communication, and this difference is
quantified by a proposed formula. The findings and theoretical analyses are
further validated by extensive experiments.
( 2
min )
Algorithmic generalization in machine learning refers to the ability to learn
the underlying algorithm that generates data in a way that generalizes
out-of-distribution. This is generally considered a difficult task for most
machine learning algorithms. Here, we analyze algorithmic generalization when
counting is required, either implicitly or explicitly. We show that standard
Transformers are based on architectural decisions that hinder
out-of-distribution performance for such tasks. In particular, we discuss the
consequences of using layer normalization and of normalizing the attention
weights via softmax. With ablation of the problematic operations, we
demonstrate that a modified transformer can exhibit a good algorithmic
generalization performance on counting while using a very lightweight
architecture.
( 2
min )
In this paper, we propose a method for knowledge graph construction in power
distribution networks. This method leverages entity features, which involve
their semantic, phonetic, and syntactic characteristics, in both the knowledge
graph of distribution network and the dispatching texts. An enhanced model
based on Convolutional Neural Network, is utilized for effectively matching
dispatch text entities with those in the knowledge graph. The effectiveness of
this model is evaluated through experiments in real-world power distribution
dispatch scenarios. The results indicate that, compared with the baselines, the
proposed model excels in linking a variety of entity types, demonstrating high
overall accuracy in power distribution knowledge graph construction task.
( 2
min )
Understanding model's sensitivity to its training data is crucial but can
also be challenging and costly, especially during training. To simplify such
issues, we present the Memory-Perturbation Equation (MPE) which relates model's
sensitivity to perturbation in its training data. Derived using Bayesian
principles, the MPE unifies existing sensitivity measures, generalizes them to
a wide-variety of models and algorithms, and unravels useful properties
regarding sensitivities. Our empirical results show that sensitivity estimates
obtained during training can be used to faithfully predict generalization on
unseen test data. The proposed equation is expected to be useful for future
research on robust and adaptive learning.
( 2
min )
Accelerating compute intensive non-real-time beam-forming algorithms in
ultrasound imaging using deep learning architectures has been gaining momentum
in the recent past. Nonetheless, the complexity of the state-of-the-art deep
learning techniques poses challenges for deployment on resource-constrained
edge devices. In this work, we propose a novel vision transformer based tiny
beamformer (Tiny-VBF), which works on the raw radio-frequency channel data
acquired through single-angle plane wave insonification. The output of our
Tiny-VBF provides fast envelope detection requiring very low frame rate, i.e.
0.34 GOPs/Frame for a frame size of 368 x 128 in comparison to the
state-of-the-art deep learning models. It also exhibited an 8% increase in
contrast and gains of 5% and 33% in axial and lateral resolution respectively
when compared to Tiny-CNN on in-vitro dataset. Additionally, our model showed a
4.2% increase in contrast and gains of 4% and 20% in axial and lateral
resolution respectively when compared against conventional Delay-and-Sum (DAS)
beamformer. We further propose an accelerator architecture and implement our
Tiny-VBF model on a Zynq UltraScale+ MPSoC ZCU104 FPGA using a hybrid
quantization scheme with 50% less resource consumption compared to the
floating-point implementation, while preserving the image quality.
( 2
min )
Partial differential equations are often used in the spatial-temporal
modeling of complex dynamical systems in many engineering applications. In this
work, we build on the recent progress of operator learning and present a
data-driven modeling framework that is continuous in both space and time. A key
feature of the proposed model is the resolution-invariance with respect to both
spatial and temporal discretizations, without demanding abundant training data
in different temporal resolutions. To improve the long-term performance of the
calibrated model, we further propose a hybrid optimization scheme that
leverages both gradient-based and derivative-free optimization methods and
efficiently trains on both short-term time series and long-term statistics. We
investigate the performance of the spatial-temporal continuous learning
framework with three numerical examples, including the viscous Burgers'
equation, the Navier-Stokes equations, and the Kuramoto-Sivashinsky equation.
The results confirm the resolution-invariance of the proposed modeling
framework and also demonstrate stable long-term simulations with only
short-term time series data. In addition, we show that the proposed model can
better predict long-term statistics via the hybrid optimization scheme with a
combined use of short-term and long-term data.
( 2
min )
Machine learning (ML) applications in medical artificial intelligence (AI)
systems have shifted from traditional and statistical methods to increasing
application of deep learning models. This survey navigates the current
landscape of multimodal ML, focusing on its profound impact on medical image
analysis and clinical decision support systems. Emphasizing challenges and
innovations in addressing multimodal representation, fusion, translation,
alignment, and co-learning, the paper explores the transformative potential of
multimodal models for clinical predictions. It also questions practical
implementation of such models, bringing attention to the dynamics between
decision support systems and healthcare providers. Despite advancements,
challenges such as data biases and the scarcity of "big data" in many
biomedical domains persist. We conclude with a discussion on effective
innovation and collaborative efforts to further the miss
( 2
min )
This article studies how to intervene against statistical discrimination,
when it is based on beliefs generated by machine learning, rather than by
humans. Unlike beliefs formed by a human mind, machine learning-generated
beliefs are verifiable. This allows interventions to move beyond simple,
belief-free designs like affirmative action, to more sophisticated ones, that
constrain decision makers in ways that depend on what they are thinking. Such
mind reading interventions can perform well where affirmative action does not,
even when the beliefs being conditioned on are possibly incorrect and biased.
( 2
min )
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL
-- have seen recent use as a way to express non-Markovian objectives in
reinforcement learning. We introduce a model-based probably approximately
correct (PAC) learning algorithm for omega-regular objectives in Markov
decision processes (MDPs). As part of the development of our algorithm, we
introduce the epsilon-recurrence time: a measure of the speed at which a policy
converges to the satisfaction of the omega-regular objective in the limit. We
prove that our algorithm only requires a polynomial number of samples in the
relevant parameters, and perform experiments which confirm our theory.
( 2
min )
My research investigates the use of cutting-edge hybrid deep learning models
to accurately differentiate between AI-generated text and human writing. I
applied a robust methodology, utilising a carefully selected dataset comprising
AI and human texts from various sources, each tagged with instructions.
Advanced natural language processing techniques facilitated the analysis of
textual features. Combining sophisticated neural networks, the custom model
enabled it to detect nuanced differences between AI and human content.
( 2
min )
The correlation between the sharpness of loss minima and generalisation in
the context of deep neural networks has been subject to discussion for a long
time. Whilst mostly investigated in the context of selected benchmark data sets
in the area of computer vision, we explore this aspect for the acoustic scene
classification task of the DCASE2020 challenge data. Our analysis is based on
two-dimensional filter-normalised visualisations and a derived sharpness
measure. Our exploratory analysis shows that sharper minima tend to show better
generalisation than flat minima -even more so for out-of-domain data, recorded
from previously unseen devices-, thus adding to the dispute about better
generalisation capabilities of flat minima. We further find that, in
particular, the choice of optimisers is a main driver of the sharpness of
minima and we discuss resulting limitations with respect to comparability. Our
code, trained model states and loss landscape visualisations are publicly
available.
( 2
min )
Traditional data-driven deep learning models often struggle with high
training costs, error accumulation, and poor generalizability in complex
physical processes. Physics-informed deep learning (PiDL) addresses these
challenges by incorporating physical principles into the model. Most PiDL
approaches regularize training by embedding governing equations into the loss
function, yet this depends heavily on extensive hyperparameter tuning to weigh
each loss term. To this end, we propose to leverage physics prior knowledge by
``baking'' the discretized governing equations into the neural network
architecture via the connection between the partial differential equations
(PDE) operators and network structures, resulting in a PDE-preserved neural
network (PPNN). This method, embedding discretized PDEs through convolutional
residual networks in a multi-resolution setting, largely improves the
generalizability and long-term prediction accuracy, outperforming conventional
black-box models. The effectiveness and merit of the proposed methods have been
demonstrated across various spatiotemporal dynamical systems governed by
spatiotemporal PDEs, including reaction-diffusion, Burgers', and Navier-Stokes
equations.
( 2
min )
We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal
ALgorithm, for solving nonconvex regularized finite sum problems under a
relative smoothness assumption. Each iteration of SPIRAL consists of an inner
and an outer loop. It combines incremental gradient updates with a linesearch
that has the remarkable property of never being triggered asymptotically,
leading to superlinear convergence under mild assumptions at the limit point.
Simulation results with L-BFGS directions on different convex, nonconvex, and
non-Lipschitz differentiable problems show that our algorithm, as well as its
adaptive variant, are competitive to the state of the art.
( 2
min )
The singular subspaces perturbation theory is of fundamental importance in
probability and statistics. It has various applications across different
fields. We consider two arbitrary matrices where one is a leave-one-column-out
submatrix of the other one and establish a novel perturbation upper bound for
the distance between the two corresponding singular subspaces. It is
well-suited for mixture models and results in a sharper and finer statistical
analysis than classical perturbation bounds such as Wedin's Theorem. Empowered
by this leave-one-out perturbation theory, we provide a deterministic entrywise
analysis for the performance of spectral clustering under mixture models. Our
analysis leads to an explicit exponential error rate for spectral clustering of
sub-Gaussian mixture models. For the mixture of isotropic Gaussians, the rate
is optimal under a weaker signal-to-noise condition than that of L{\"o}ffler et
al. (2021).
( 2
min )
Sequential recommendation models, models that learn from chronological
user-item interactions, outperform traditional recommendation models in many
settings. Despite the success of sequential recommendation models, their
robustness has recently come into question. Two properties unique to the nature
of sequential recommendation models may impair their robustness - the cascade
effects induced during training and the model's tendency to rely too heavily on
temporal information. To address these vulnerabilities, we propose
Cascade-guided Adversarial training, a new adversarial training procedure that
is specifically designed for sequential recommendation models. Our approach
harnesses the intrinsic cascade effects present in sequential modeling to
produce strategic adversarial perturbations to item embeddings during training.
Experiments on training state-of-the-art sequential models on four public
datasets from different domains show that our training approach produces
superior model ranking accuracy and superior model robustness to real item
replacement perturbations when compared to both standard model training and
generic adversarial training.
( 2
min )
The introduction of computerized medical records in hospitals has reduced
burdensome activities like manual writing and information fetching. However,
the data contained in medical records are still far underutilized, primarily
because extracting data from unstructured textual medical records takes time
and effort. Information Extraction, a subfield of Natural Language Processing,
can help clinical practitioners overcome this limitation by using automated
text-mining pipelines. In this work, we created the first Italian
neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to
develop a Transformers-based model. Moreover, we collected and leveraged three
external independent datasets to implement an effective multicenter model, with
overall F1-score 84.77%, Precision 83.16%, Recall 86.44%. The lessons learned
are: (i) the crucial role of a consistent annotation process and (ii) a
fine-tuning strategy that combines classical methods with a "low-resource"
approach. This allowed us to establish methodological guidelines that pave the
way for Natural Language Processing studies in less-resourced languages.
( 3
min )
MCMC algorithms offer empirically efficient tools for sampling from a target
distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC
algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our
work examines this gap and shows that when Poincar\'e-style inequality holds on
a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC
iterates over $\mathcal{X}$ mixes fast to the true conditional distribution.
This fast mixing guarantee can hold in cases when global mixing is provably
slow. We formalize the statement and quantify the conditional mixing rate. We
further show that conditional mixing can have interesting implications for
sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture
models and Gibbs-sampling with well-connected local minima.
( 2
min )
We study gradient descent under linearly correlated noise. Our work is
motivated by recent practical methods for optimization with differential
privacy (DP), such as DP-FTRL, which achieve strong performance in settings
where privacy amplification techniques are infeasible (such as in federated
learning). These methods inject privacy noise through a matrix factorization
mechanism, making the noise linearly correlated over iterations. We propose a
simplified setting that distills key facets of these methods and isolates the
impact of linearly correlated noise. We analyze the behavior of gradient
descent in this setting, for both convex and non-convex functions. Our analysis
is demonstrably tighter than prior work and recovers multiple important special
cases exactly (including anticorrelated perturbed gradient descent). We use our
results to develop new, effective matrix factorizations for differentially
private optimization, and highlight the benefits of these factorizations
theoretically and empirically.
( 2
min )
Recurrent neural networks (RNNs) are a class of neural networks that have
emerged from the paradigm of artificial intelligence and has enabled lots of
interesting advances in the field of natural language processing.
Interestingly, these architectures were shown to be powerful ansatze to
approximate the ground state of quantum systems. Here, we build over the
results of [Phys. Rev. Research 2, 023358 (2020)] and construct a more powerful
RNN wave function ansatz in two dimensions. We use symmetry and annealing to
obtain accurate estimates of ground state energies of the two-dimensional (2D)
Heisenberg model, on the square lattice and on the triangular lattice. We show
that our method is superior to Density Matrix Renormalisation Group (DMRG) for
system sizes larger than or equal to $14 \times 14$ on the triangular lattice.
( 2
min )
We study the convergence of stochastic gradient descent (SGD) for non-convex
objective functions. We establish the local convergence with positive
probability under the local \L{}ojasiewicz condition introduced by Chatterjee
in \cite{chatterjee2022convergence} and an additional local structural
assumption of the loss function landscape. A key component of our proof is to
ensure that the whole trajectories of SGD stay inside the local region with a
positive probability. We also provide examples of neural networks with finite
widths such that our assumptions hold.
( 2
min )
In industry deep learning application, our manually labeled data has a
certain number of noisy data. To solve this problem and achieve more than 90
score in dev dataset, we present a simple method to find the noisy data and
re-label the noisy data by human, given the model predictions as references in
human labeling. In this paper, we illustrate our idea for a broad set of deep
learning tasks, includes classification, sequence tagging, object detection,
sequence generation, click-through rate prediction. The dev dataset evaluation
results and human evaluation results verify our idea.
( 2
min )
We examine the relationship between the mutual information between the output
model and the empirical sample and the generalization of the algorithm in the
context of stochastic convex optimization. Despite increasing interest in
information-theoretic generalization bounds, it is uncertain if these bounds
can provide insight into the exceptional performance of various learning
algorithms. Our study of stochastic convex optimization reveals that, for true
risk minimization, dimension-dependent mutual information is necessary. This
indicates that existing information-theoretic generalization bounds fall short
in capturing the generalization capabilities of algorithms like SGD and
regularized ERM, which have dimension-independent sample complexity.
( 2
min )
Despite the significant progress made by transformer models in machine
reading comprehension tasks, they still fall short in handling complex
reasoning tasks due to the absence of explicit knowledge in the input sequence.
To address this limitation, many recent works have proposed injecting external
knowledge into the model. However, selecting relevant external knowledge,
ensuring its availability, and requiring additional processing steps remain
challenging. In this paper, we introduce a novel attention pattern that
integrates reasoning knowledge derived from a heterogeneous graph into the
transformer architecture without relying on external knowledge. The proposed
attention pattern comprises three key elements: global-local attention for word
tokens, graph attention for entity tokens that exhibit strong attention towards
tokens connected in the graph as opposed to those unconnected, and the
consideration of the type of relationship between each entity token and word
token. This results in optimized attention between the two if a relationship
exists. The pattern is coupled with special relative position labels, allowing
it to integrate with LUKE's entity-aware self-attention mechanism. The
experimental findings corroborate that our model outperforms both the
cutting-edge LUKE-Graph and the baseline LUKE model across two distinct
datasets: ReCoRD, emphasizing commonsense reasoning, and WikiHop, focusing on
multi-hop reasoning challenges.
( 3
min )
Characters do not convey meaning, but sequences of characters do. We propose
an unsupervised distributional method to learn the abstract meaningful units in
a sequence of characters. Rather than segmenting the sequence, our Dynamic
Capacity Slot Attention model discovers continuous representations of the
objects in the sequence, extending an architecture for object discovery in
images. We train our model on different languages and evaluate the quality of
the obtained representations with forward and reverse probing classifiers.
These experiments show that our model succeeds in discovering units which are
similar to those proposed previously in form, content and level of abstraction,
and which show promise for capturing meaningful information at a higher level
of abstraction.
( 2
min )
Recent years have seen rapid development of descriptor generation based on
representation learning of extremely diverse molecules, especially those that
apply natural language processing (NLP) models to SMILES, a literal
representation of molecular structure. However, little research has been done
on how these models understand chemical structure. To address this black box,
we investigated the relationship between the learning progress of SMILES and
chemical structure using a representative NLP model, the Transformer. We show
that while the Transformer learns partial structures of molecules quickly, it
requires extended training to understand overall structures. Consistently, the
accuracy of molecular property predictions using descriptors generated from
models at different learning steps was similar from the beginning to the end of
training. Furthermore, we found that the Transformer requires particularly long
training to learn chirality and sometimes stagnates with low performance due to
misunderstanding of enantiomers. These findings are expected to deepen the
understanding of NLP models in chemistry.
( 2
min )
Photovoltaic (PV) power generation has emerged as one of the lead renewable
energy sources. Yet, its production is characterized by high uncertainty, being
dependent on weather conditions like solar irradiance and temperature.
Predicting PV production, even in the 24-hour forecast, remains a challenge and
leads energy providers to left idling - often carbon emitting - plants. In this
paper, we introduce a Long-Term Recurrent Convolutional Network using Numerical
Weather Predictions (NWP) to predict, in turn, PV production in the 24-hour and
48-hour forecast horizons. This network architecture fully leverages both
temporal and spatial weather data, sampled over the whole geographical area of
interest. We train our model on an NWP dataset from the National Oceanic and
Atmospheric Administration (NOAA) to predict spatially aggregated PV production
in Germany. We compare its performance to the persistence model and
state-of-the-art methods.
( 2
min )
The use of mini-batches of data in training artificial neural networks is
nowadays very common. Despite its broad usage, theories explaining
quantitatively how large or small the optimal mini-batch size should be are
missing. This work presents a systematic attempt at understanding the role of
the mini-batch size in training two-layer neural networks. Working in the
teacher-student scenario, with a sparse teacher, and focusing on tasks of
different complexity, we quantify the effects of changing the mini-batch size
$m$. We find that often the generalization performances of the student strongly
depend on $m$ and may undergo sharp phase transitions at a critical value
$m_c$, such that for $mm_c$ the
student learns perfectly or generalizes very well the teacher. Phase
transitions are induced by collective phenomena firstly discovered in
statistical mechanics and later observed in many fields of science. Observing a
phase transition by varying the mini-batch size across different architectures
raises several questions about the role of this hyperparameter in the neural
network learning process.
( 3
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
Global optimization of decision trees has shown to be promising in terms of
accuracy, size, and consequently human comprehensibility. However, many of the
methods used rely on general-purpose solvers for which scalability remains an
issue. Dynamic programming methods have been shown to scale much better because
they exploit the tree structure by solving subtrees as independent subproblems.
However, this only works when an objective can be optimized separately for
subtrees. We explore this relationship in detail and show the necessary and
sufficient conditions for such separability and generalize previous dynamic
programming approaches into a framework that can optimize any combination of
separable objectives and constraints. Experiments on five application domains
show the general applicability of this framework, while outperforming the
scalability of general-purpose solvers by a large margin.
( 2
min )
We propose Pgx, a suite of board game reinforcement learning (RL)
environments written in JAX and optimized for GPU/TPU accelerators. By
leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx
can efficiently scale to thousands of simultaneous simulations over
accelerators. In our experiments on a DGX-A100 workstation, we discovered that
Pgx can simulate RL environments 10-100x faster than existing implementations
available in Python. Pgx includes RL environments commonly used as benchmarks
in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx
offers miniature game sets and baseline models to facilitate rapid research
cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm
with Pgx environments. Overall, Pgx provides high-performance environment
simulators for researchers to accelerate their RL experiments. Pgx is available
at this http URL
( 2
min )
This paper presents Translatotron 3, a novel approach to unsupervised direct
speech-to-speech translation from monolingual speech-text datasets by combining
masked autoencoder, unsupervised embedding mapping, and back-translation.
Experimental results in speech-to-speech translation tasks between Spanish and
English show that Translatotron 3 outperforms a baseline cascade system,
reporting $18.14$ BLEU points improvement on the synthesized
Unpaired-Conversational dataset. In contrast to supervised approaches that
necessitate real paired data, or specialized modeling to replicate
para-/non-linguistic information such as pauses, speaking rates, and speaker
identity, Translatotron 3 showcases its capability to retain it. Audio samples
can be found at this http URL
( 2
min )
In this paper, we investigate the complexity of feed-forward neural networks
by examining the concept of functional equivalence, which suggests that
different network parameterizations can lead to the same function. We utilize
the permutation invariance property to derive a novel covering number bound for
the class of feedforward neural networks, which reveals that the complexity of
a neural network can be reduced by exploiting this property. We discuss the
extensions to convolutional neural networks, residual networks, and
attention-based models. We demonstrate that functional equivalence benefits
optimization, as overparameterized networks tend to be easier to train since
increasing network width leads to a diminishing volume of the effective
parameter space. Our findings offer new insights into overparameterization and
have significant implications for understanding generalization and optimization
in deep learning.
( 2
min )
The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance
reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a
gradient of the objective function from time to time. In this paper, we remove
the necessity of a full gradient computation. This is achieved by using a
randomized reshuffling strategy and aggregating stochastic gradients obtained
in each epoch. The aggregated stochastic gradients serve as an estimate of a
full gradient in the SARAH algorithm. We provide a theoretical analysis of the
proposed approach and conclude the paper with numerical experiments that
demonstrate the efficiency of this approach.
( 2
min )
We consider information-theoretic bounds on expected generalization error for
statistical learning problems in a networked setting. In this setting, there
are $K$ nodes, each with its own independent dataset, and the models from each
node have to be aggregated into a final centralized model. We consider both
simple averaging of the models as well as more complicated multi-round
algorithms. We give upper bounds on the expected generalization error for a
variety of problems, such as those with Bregman divergence or Lipschitz
continuous losses, that demonstrate an improved dependence of $1/K$ on the
number of nodes. These "per node" bounds are in terms of the mutual information
between the training dataset and the trained weights at each node, and are
therefore useful in describing the generalization properties inherent to having
communication or privacy constraints at each node.
( 2
min )
Anomaly detection is a challenging task for machine learning algorithms due
to the inherent class imbalance. It is costly and time-demanding to manually
analyse the observed data, thus usually only few known anomalies if any are
available. Inspired by generative models and the analysis of the hidden
activations of neural networks, we introduce a novel unsupervised anomaly
detection method called DA3D. Here, we use adversarial autoencoders to generate
anomalous counterexamples based on the normal data only. These artificial
anomalies used during training allow the detection of real, yet unseen
anomalies. With our novel generative approach, we transform the unsupervised
task of anomaly detection to a supervised one, which is more tractable by
machine learning and especially deep learning methods. DA3D surpasses the
performance of state-of-the-art anomaly detection methods in a purely
data-driven way, where no domain knowledge is required.
( 2
min )
We introduce a new family of neural network models called Convolutional
Dynamic Alignment Networks (CoDA-Nets), which are performant classifiers with a
high degree of inherent interpretability. Their core building blocks are
Dynamic Alignment Units (DAUs), which linearly transform their input with
weight vectors that dynamically align with task-relevant patterns. As a result,
CoDA-Nets model the classification prediction through a series of
input-dependent linear transformations, allowing for linear decomposition of
the output into individual input contributions. Given the alignment of the
DAUs, the resulting contribution maps align with discriminative input patterns.
These model-inherent decompositions are of high visual quality and outperform
existing attribution methods under quantitative metrics. Further, CoDA-Nets
constitute performant classifiers, achieving on par results to ResNet and VGG
models on e.g. CIFAR-10 and TinyImagenet.
( 2
min )
Voxel-based multiple testing is widely used in neuroimaging data analysis.
Traditional false discovery rate (FDR) control methods often ignore the spatial
dependence among the voxel-based tests and thus suffer from substantial loss of
testing power. While recent spatial FDR control methods have emerged, their
validity and optimality remain questionable when handling the complex spatial
dependencies of the brain. Concurrently, deep learning methods have
revolutionized image segmentation, a task closely related to voxel-based
multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR
control method that leverages unsupervised deep learning-based image
segmentation to address the voxel-based multiple testing problem. Numerical
studies, including comprehensive simulations and Alzheimer's disease FDG-PET
image analysis, demonstrate DeepFDR's superiority over existing methods.
DeepFDR not only excels in FDR control and effectively diminishes the false
nondiscovery rate, but also boasts exceptional computational efficiency highly
suited for tackling large-scale neuroimaging data.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
Transfer learning and ensembling are two popular techniques for improving the
performance and robustness of neural networks. Due to the high cost of
pre-training, ensembles of models fine-tuned from a single pre-trained
checkpoint are often used in practice. Such models end up in the same basin of
the loss landscape, which we call the pre-train basin, and thus have limited
diversity. In this work, we show that ensembles trained from a single
pre-trained checkpoint may be improved by better exploring the pre-train basin,
however, leaving the basin results in losing the benefits of transfer learning
and in degradation of the ensemble quality. Based on the analysis of existing
exploration methods, we propose a more effective modification of the Snapshot
Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger
ensembles and uniform model soups.
( 2
min )
This paper proposes a new easy-to-implement parameter-free gradient-based
optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is
efficient -- matching the convergence rate of optimally tuned gradient descent
in convex optimization up to a logarithmic factor without tuning any
parameters, and universal -- automatically adapting to both smooth and
nonsmooth problems. While popular algorithms following the AdaGrad framework
compute a running average of the squared gradients to use for normalization,
DoWG maintains a new distance-based weighted version of the running average,
which is crucial to achieve the desired properties. To complement our theory,
we also show empirically that DoWG trains at the edge of stability, and
validate its effectiveness on practical machine learning tasks.
( 2
min )
Understanding model's sensitivity to its training data is crucial but can
also be challenging and costly, especially during training. To simplify such
issues, we present the Memory-Perturbation Equation (MPE) which relates model's
sensitivity to perturbation in its training data. Derived using Bayesian
principles, the MPE unifies existing sensitivity measures, generalizes them to
a wide-variety of models and algorithms, and unravels useful properties
regarding sensitivities. Our empirical results show that sensitivity estimates
obtained during training can be used to faithfully predict generalization on
unseen test data. The proposed equation is expected to be useful for future
research on robust and adaptive learning.
( 2
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
We revisit processes generated by iterated random functions driven by a
stationary and ergodic sequence. Such a process is called strongly stable if a
random initialization exists, for which the process is stationary and ergodic,
and for any other initialization, the difference of the two processes converges
to zero almost surely. Under some mild conditions on the corresponding
recursive map, without any condition on the driving sequence, we show the
strong stability of iterations. Several applications are surveyed such as
stochastic approximation and queuing. Furthermore, new results are deduced for
Langevin-type iterations with dependent noise and for multitype branching
processes.
( 2
min )
We examine the relationship between the mutual information between the output
model and the empirical sample and the generalization of the algorithm in the
context of stochastic convex optimization. Despite increasing interest in
information-theoretic generalization bounds, it is uncertain if these bounds
can provide insight into the exceptional performance of various learning
algorithms. Our study of stochastic convex optimization reveals that, for true
risk minimization, dimension-dependent mutual information is necessary. This
indicates that existing information-theoretic generalization bounds fall short
in capturing the generalization capabilities of algorithms like SGD and
regularized ERM, which have dimension-independent sample complexity.
( 2
min )
Voxel-based multiple testing is widely used in neuroimaging data analysis.
Traditional false discovery rate (FDR) control methods often ignore the spatial
dependence among the voxel-based tests and thus suffer from substantial loss of
testing power. While recent spatial FDR control methods have emerged, their
validity and optimality remain questionable when handling the complex spatial
dependencies of the brain. Concurrently, deep learning methods have
revolutionized image segmentation, a task closely related to voxel-based
multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR
control method that leverages unsupervised deep learning-based image
segmentation to address the voxel-based multiple testing problem. Numerical
studies, including comprehensive simulations and Alzheimer's disease FDG-PET
image analysis, demonstrate DeepFDR's superiority over existing methods.
DeepFDR not only excels in FDR control and effectively diminishes the false
nondiscovery rate, but also boasts exceptional computational efficiency highly
suited for tackling large-scale neuroimaging data.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Transfer learning and ensembling are two popular techniques for improving the
performance and robustness of neural networks. Due to the high cost of
pre-training, ensembles of models fine-tuned from a single pre-trained
checkpoint are often used in practice. Such models end up in the same basin of
the loss landscape, which we call the pre-train basin, and thus have limited
diversity. In this work, we show that ensembles trained from a single
pre-trained checkpoint may be improved by better exploring the pre-train basin,
however, leaving the basin results in losing the benefits of transfer learning
and in degradation of the ensemble quality. Based on the analysis of existing
exploration methods, we propose a more effective modification of the Snapshot
Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger
ensembles and uniform model soups.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
We develop a novel deep learning approach for pricing European basket options
written on assets that follow jump-diffusion dynamics. The option pricing
problem is formulated as a partial integro-differential equation, which is
approximated via a new implicit-explicit minimizing movement time-stepping
approach, involving approximation by deep, residual-type Artificial Neural
Networks (ANNs) for each time step. The integral operator is discretized via
two different approaches: a) a sparse-grid Gauss--Hermite approximation
following localised coordinate axes arising from singular value decompositions,
and b) an ANN-based high-dimensional special-purpose quadrature rule.
Crucially, the proposed ANN is constructed to ensure the asymptotic behavior of
the solution for large values of the underlyings and also leads to consistent
outputs with respect to a priori known qualitative properties of the solution.
The performance and robustness with respect to the dimension of the methods are
assessed in a series of numerical experiments involving the Merton
jump-diffusion model.
( 2
min )
We study the convergence of stochastic gradient descent (SGD) for non-convex
objective functions. We establish the local convergence with positive
probability under the local \L{}ojasiewicz condition introduced by Chatterjee
in \cite{chatterjee2022convergence} and an additional local structural
assumption of the loss function landscape. A key component of our proof is to
ensure that the whole trajectories of SGD stay inside the local region with a
positive probability. We also provide examples of neural networks with finite
widths such that our assumptions hold.
( 2
min )
Fixed point lattice actions are designed to have continuum classical
properties unaffected by discretization effects and reduced lattice artifacts
at the quantum level. They provide a possible way to extract continuum physics
with coarser lattices, thereby allowing to circumvent problems with critical
slowing down and topological freezing toward the continuum limit. A crucial
ingredient for practical applications is to find an accurate and compact
parametrization of a fixed point action, since many of its properties are only
implicitly defined. Here we use machine learning methods to revisit the
question of how to parametrize fixed point actions. In particular, we obtain a
fixed point action for four-dimensional SU(3) gauge theory using convolutional
neural networks with exact gauge invariance. The large operator space allows us
to find superior parametrizations compared to previous studies, a necessary
first step for future Monte Carlo simulations.
( 2
min )
This paper proposes a new easy-to-implement parameter-free gradient-based
optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is
efficient -- matching the convergence rate of optimally tuned gradient descent
in convex optimization up to a logarithmic factor without tuning any
parameters, and universal -- automatically adapting to both smooth and
nonsmooth problems. While popular algorithms following the AdaGrad framework
compute a running average of the squared gradients to use for normalization,
DoWG maintains a new distance-based weighted version of the running average,
which is crucial to achieve the desired properties. To complement our theory,
we also show empirically that DoWG trains at the edge of stability, and
validate its effectiveness on practical machine learning tasks.
( 2
min )
Generalized linear regressions, such as logistic regressions or Poisson
regressions, are long-studied regression analysis approaches, and their
applications are widely employed in various classification problems. Our study
considers a stochastic generalized linear regression model as a stochastic
problem with chance constraints and tackles it using nonconvex programming
techniques. Clustering techniques and quantile estimation are also used to
estimate random data's mean and variance-covariance matrix. Metrics for
measuring the performance of logistic regression are used to assess the model's
efficacy, including the F1 score, precision score, and recall score. The
results of the proposed algorithm were over 1 to 2 percent better than the
ordinary logistic regression model on the same dataset with the above
assessment criteria.
( 2
min )
Multi-view data arises frequently in modern network analysis e.g. relations
of multiple types among individuals in social network analysis, longitudinal
measurements of interactions among observational units, annotated networks with
noisy partial labeling of vertices etc. We study community detection in these
disparate settings via a unified theoretical framework, and investigate the
fundamental thresholds for community recovery. We characterize the mutual
information between the data and the latent parameters, provided the degrees
are sufficiently large. Based on this general result, (i) we derive a sharp
threshold for community detection in an inhomogeneous multilayer block model
\citep{chen2022global}, (ii) characterize a sharp threshold for weak recovery
in a dynamic stochastic block model \citep{matias2017statistical}, and (iii)
identify the limiting mutual information in an unbalanced partially labeled
block model. Our first two results are derived modulo coordinate-wise convexity
assumptions on specific functions -- we provide extensive numerical evidence
for their correctness. Finally, we introduce iterative algorithms based on
Approximate Message Passing for community detection in these problems.
( 2
min )
This paper proposes two methods for causal additive models with unobserved
variables (CAM-UV). CAM-UV assumes that the causal functions take the form of
generalized additive models and that latent confounders are present. First, we
propose a method that leverages prior knowledge for efficient causal discovery.
Then, we propose an extension of this method for inferring causality in time
series data. The original CAM-UV algorithm differs from other existing causal
function models in that it does not seek the causal order between observed
variables, but rather aims to identify the causes for each observed variable.
Therefore, the first proposed method in this paper utilizes prior knowledge,
such as understanding that certain variables cannot be causes of specific
others. Moreover, by incorporating the prior knowledge that causes precedes
their effects in time, we extend the first algorithm to the second method for
causal discovery in time series data. We validate the first proposed method by
using simulated data to demonstrate that the accuracy of causal discovery
increases as more prior knowledge is accumulated. Additionally, we test the
second proposed method by comparing it with existing time series causal
discovery methods, using both simulated data and real-world data.
( 3
min )
In this paper, we propose a probabilistic reduced-dimensional vector
autoregressive (PredVAR) model to extract low-dimensional dynamics from
high-dimensional noisy data. The model utilizes an oblique projection to
partition the measurement space into a subspace that accommodates the
reduced-dimensional dynamics and a complementary static subspace. An optimal
oblique decomposition is derived for the best predictability regarding
prediction error covariance. Building on this, we develop an iterative PredVAR
algorithm using maximum likelihood and the expectation-maximization (EM)
framework. This algorithm alternately updates the estimates of the latent
dynamics and optimal oblique projection, yielding dynamic latent variables with
rank-ordered predictability and an explicit latent VAR model that is consistent
with the outer projection model. The superior performance and efficiency of the
proposed approach are demonstrated using data sets from a synthesized Lorenz
system and an industrial process from Eastman Chemical.
( 2
min )
Optimizing Neural networks is a difficult task which is still not well
understood. On the other hand, fixed representation methods such as kernels and
random features have provable optimization guarantees but inferior performance
due to their inherent inability to learn the representations. In this paper, we
aim at bridging this gap by presenting a novel architecture called RedEx
(Reduced Expander Extractor) that is as expressive as neural networks and can
also be trained in a layer-wise fashion via a convex program with semi-definite
constraints and optimization guarantees. We also show that RedEx provably
surpasses fixed representation methods, in the sense that it can efficiently
learn a family of target functions which fixed representation methods cannot.
( 2
min )
Today, we’re excited to announce the availability of Llama 2 inference and fine-tuning support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Using AWS Trainium and Inferentia based instances, through SageMaker, can help users lower fine-tuning costs by up to 50%, and lower deployment costs by 4.7x, while lowering per token latency. […]
( 18
min )
Geospatial data is data about specific locations on the earth’s surface. It can represent a geographical area as a whole or it can represent an event associated with a geographical area. Analysis of geospatial data is sought after in a few industries. It involves understanding where the data exists from a spatial perspective and why […]
( 13
min )
An interdisciplinary team of researchers thinks health AI could benefit from some of the aviation industry’s long history of hard-won lessons that have created one of the safest activities today.
( 11
min )
The AI Podcast · DigitalPath’s Ethan Higgins On Using AI to Fight Wildfires – Ep. 211 DigitalPath is igniting change in the Golden State — using computer vision, generative adversarial networks and a network of thousands of cameras to detect signs of fire in real time. In the latest episode of NVIDIA’s AI Podcast, host Read article >
( 6
min )
Traditional relational databases struggle with unstructured data – the text, images, videos, and social media feeds that flood our modern world. But graph databases, with their unique structure, offer a powerful tool for taming this chaos and extracting valuable insights. Here’s how they bring a game-changing perspective to unstructured data analytics: Modeling relationships, not just… Read More »Graph databases: Unveiling the hidden connections in unstructured data
The post Graph databases: Unveiling the hidden connections in unstructured data appeared first on Data Science Central.
( 22
min )
Customer success stories illuminate how hardware accelerators speed necessary infrastructure to support all aspects of an accelerated AI and HPC computing datacenter.
The post Use cases show that on-package accelerators benefit HPC/AI workloads from computation to data movement and security appeared first on Data Science Central.
( 27
min )
OpenAI Whisper is an advanced automatic speech recognition (ASR) model with an MIT license. ASR technology finds utility in transcription services, voice assistants, and enhancing accessibility for individuals with hearing impairments. This state-of-the-art model is trained on a vast and diverse dataset of multilingual and multitask supervised data collected from the web. Its high accuracy […]
( 11
min )
The Global Health Drug Discovery Institute and Microsoft Research are using AI to innovate in life sciences by accelerating the development of new treatments for global infectious diseases like tuberculosis and COVID. Find out how.
The post GHDDI and Microsoft Research use AI technology to achieve significant progress in discovering new drugs to treat global infectious diseases appeared first on Microsoft Research.
( 11
min )
Indigenous languages are under threat. Some 3,000 — three-quarters of the total — could disappear before the end of the century, or one every two weeks, according to UNESCO. As part of a movement to protect such languages, New Zealand’s Te Hiku Media, a broadcaster focused on the Māori people’s indigenous language known as te Read article >
( 7
min )
Curiosity leads the way for this week’s featured In the NVIDIA Studio 3D artist, Brellias.
( 7
min )
We funded 10 teams from around the world to design ideas and tools to collectively govern AI. We summarize the innovations, outline our learnings, and call for researchers and engineers to join us as we continue this work.
( 6
min )
We study semi-supervised sequence generation tasks where labeled data are too
scarce to effectively finetune a model and at the same time few-shot prompting
of a large language model (LLM) has suboptimal performance. This happens when a
task, such as parsing, is expensive to annotate and also unfamiliar to a
pretrained LLM. In this paper, we present a discovery that student models
distilled from an in-context learned LLM can often generalize better than their
teacher on such tasks. Leveraging this finding, we present a new method --
multistage collaborative knowledge distillation from an LLM (MCKD) -- for such
tasks. MCKD first few-shot prompts an LLM to produce pseudolabels for unlabeled
data. At each intermediate knowledge distillation (KD) stage, a new pair of
students is trained on disjoint partitions of the pseudolabeled data. Each
student then produces new and improved pseudolabels for its unseen partition to
be used in the next stage of distillation. We demonstrate the advantage of
multistage cross-partition labeling on several syntactic and semantic parsing
tasks. On CRAFT biomedical parsing, for example, 3-stage MCKD with 50 labeled
examples outperforms the prompted LLM and vanilla KD by 7.5% and 3.7% parsing
F1, respectively, and matches the performance of supervised finetuning with 500
examples.
( 3
min )
This letter proposes a novel relaying framework, semantic-forward (SF), for
cooperative communications towards the sixth-generation (6G) wireless networks.
The SF relay extracts and transmits the semantic features, which reduces
forwarding payload, and also improves the network robustness against intra-link
errors. Based on the theoretical basis for cooperative communications with side
information and the turbo principle, we design a joint source-channel coding
algorithm to iteratively exchange the extrinsic information for enhancing the
decoding gains at the destination. Surprisingly, simulation results indicate
that even in bad channel conditions, SF relaying can still effectively improve
the recovered information quality.
( 2
min )
We develop a novel deep learning approach for pricing European basket options
written on assets that follow jump-diffusion dynamics. The option pricing
problem is formulated as a partial integro-differential equation, which is
approximated via a new implicit-explicit minimizing movement time-stepping
approach, involving approximation by deep, residual-type Artificial Neural
Networks (ANNs) for each time step. The integral operator is discretized via
two different approaches: a) a sparse-grid Gauss--Hermite approximation
following localised coordinate axes arising from singular value decompositions,
and b) an ANN-based high-dimensional special-purpose quadrature rule.
Crucially, the proposed ANN is constructed to ensure the asymptotic behavior of
the solution for large values of the underlyings and also leads to consistent
outputs with respect to a priori known qualitative properties of the solution.
The performance and robustness with respect to the dimension of the methods are
assessed in a series of numerical experiments involving the Merton
jump-diffusion model.
( 2
min )
Pedestrian intention prediction is crucial for autonomous driving. In
particular, knowing if pedestrians are going to cross in front of the
ego-vehicle is core to performing safe and comfortable maneuvers. Creating
accurate and fast models that predict such intentions from sequential images is
challenging. A factor contributing to this is the lack of datasets with diverse
crossing and non-crossing (C/NC) scenarios. We address this scarceness by
introducing a framework, named ARCANE, which allows programmatically generating
synthetic datasets consisting of C/NC video clip samples. As an example, we use
ARCANE to generate a large and diverse dataset named PedSynth. We will show how
PedSynth complements widely used real-world datasets such as JAAD and PIE, so
enabling more accurate models for C/NC prediction. Considering the onboard
deployment of C/NC prediction models, we also propose a deep model named
PedGNN, which is fast and has a very low memory footprint. PedGNN is based on a
GNN-GRU architecture that takes a sequence of pedestrian skeletons as input to
predict crossing intentions.
( 2
min )
In this paper we clarify the crucial difference between a deep neural network
and the Fourier series. For the multiple Fourier series of the periodization of
some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the
behavior of the spherical partial sum, and discovered the third phenomenon
other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular,
the third one exhibits prevention of pointwise convergence. In contrast to it,
we give a specific deep neural network and prove pointwise convergence.
( 2
min )
The traditional role of the network layer is the transfer of packet replicas
from source to destination through intermediate network nodes. We present a
generative network layer that uses Generative AI (GenAI) at intermediate or
edge network nodes and analyze its impact on the required data rates in the
network. We conduct a case study where the GenAI-aided nodes generate images
from prompts that consist of substantially compressed latent representations.
The results from network flow analyses under image quality constraints show
that the generative network layer can achieve an improvement of more than 100%
in terms of the required data rate.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
The Model Parameter Randomisation Test (MPRT) is widely acknowledged in the
eXplainable Artificial Intelligence (XAI) community for its well-motivated
evaluative principle: that the explanation function should be sensitive to
changes in the parameters of the model function. However, recent works have
identified several methodological caveats for the empirical interpretation of
MPRT. To address these caveats, we introduce two adaptations to the original
MPRT -- Smooth MPRT and Efficient MPRT, where the former minimises the impact
that noise has on the evaluation results through sampling and the latter
circumvents the need for biased similarity measurements by re-interpreting the
test through the explanation's rise in complexity, after full parameter
randomisation. Our experimental results demonstrate that these proposed
variants lead to improved metric reliability, thus enabling a more trustworthy
application of XAI methods.
( 2
min )
We present a large-scale empirical study of how choices of configuration
parameters affect performance in knowledge distillation (KD). An example of
such a KD parameter is the measure of distance between the predictions of the
teacher and the student, common choices for which include the mean squared
error (MSE) and the KL-divergence. Although scattered efforts have been made to
understand the differences between such options, the KD literature still lacks
a systematic study on their general effect on student performance. We take an
empirical approach to this question in this paper, seeking to find out the
extent to which such choices influence student performance across 13 datasets
from 4 NLP tasks and 3 student sizes. We quantify the cost of making
sub-optimal choices and identify a single configuration that performs well
across the board.
( 2
min )
In this work, we have proposed an approach for improving the GCN for
predicting ratings in social networks. Our model is expanded from the standard
model with several layers of transformer architecture. The main focus of the
paper is on the encoder architecture for node embedding in the network. Using
the embedding layer from the graph-based convolution layer, the attention
mechanism could rearrange the feature space to get a more efficient embedding
for the downstream task. The experiments showed that our proposed architecture
achieves better performance than GCN on the traditional link prediction task.
( 2
min )
Feature selection in noisy label scenarios remains an understudied topic. We
propose a novel genetic algorithm-based approach, the Noise-Aware
Multi-Objective Feature Selection Genetic Algorithm (NMFS-GA), for selecting
optimal feature subsets in binary classification with noisy labels. NMFS-GA
offers a unified framework for selecting feature subsets that are both accurate
and interpretable. We evaluate NMFS-GA on synthetic datasets with label noise,
a Breast Cancer dataset enriched with noisy features, and a real-world ADNI
dataset for dementia conversion prediction. Our results indicate that NMFS-GA
can effectively select feature subsets that improve the accuracy and
interpretability of binary classifiers in scenarios with noisy labels.
( 2
min )
We propose a Block Majorization Minimization method with Extrapolation (BMMe)
for solving a class of multi-convex optimization problems. The extrapolation
parameters of BMMe are updated using a novel adaptive update rule. By showing
that block majorization minimization can be reformulated as a block mirror
descent method, with the Bregman divergence adaptively updated at each
iteration, we establish subsequential convergence for BMMe. We use this method
to design efficient algorithms to tackle nonnegative matrix factorization
problems with the $\beta$-divergences ($\beta$-NMF) for $\beta\in [1,2]$. These
algorithms, which are multiplicative updates with extrapolation, benefit from
our novel results that offer convergence guarantees. We also empirically
illustrate the significant acceleration of BMMe for $\beta$-NMF through
extensive experiments.
( 2
min )
This paper describes the use of connectionist techniques in phonetic speech
recognition with strong latency constraints. The constraints are imposed by the
task of deriving the lip movements of a synthetic face in real time from the
speech signal, by feeding the phonetic string into an articulatory synthesiser.
Particular attention has been paid to analysing the interaction between the
time evolution model learnt by the multi-layer perceptrons and the transition
model imposed by the Viterbi decoder, in different latency conditions. Two
experiments were conducted in which the time dependencies in the language model
(LM) were controlled by a parameter. The results show a strong interaction
between the three factors involved, namely the neural network topology, the
length of time dependencies in the LM and the decoder latency.
( 2
min )
Speech has long been a barrier to effective communication and connection,
persisting as a challenge in our increasingly interconnected world. This
research paper introduces a transformative solution to this persistent obstacle
an end-to-end speech conversion framework tailored for Hindi-to-English
translation, culminating in the synthesis of English audio. By integrating
cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech
recognition (ASR), mBART for neural machine translation (NMT), and a
Text-to-Speech (TTS) synthesis component, this framework offers a unified and
seamless approach to cross-lingual communication. We delve into the intricate
details of each component, elucidating their individual contributions and
exploring the synergies that enable a fluid transition from spoken Hindi to
synthesized English audio.
( 2
min )
This paper introduces Qrlew, an open source library that can parse SQL
queries into Relations -- an intermediate representation -- that keeps track of
rich data types, value ranges, and row ownership; so that they can easily be
rewritten into differentially-private equivalent and turned back into SQL
queries for execution in a variety of standard data stores.
With Qrlew, a data practitioner can express their data queries in standard
SQL; the data owner can run the rewritten query without any technical
integration and with strong privacy guarantees on the output; and the query
rewriting can be operated by a privacy-expert who must be trusted by the owner,
but may belong to a separate organization.
( 2
min )
To extend the antenna design on printed circuit boards (PCBs) for more
engineers of interest, we propose a simple method that models PCB antennas with
a few basic components. By taking two separate steps to decide their geometric
dimensions and positions, antenna prototypes can be facilitated with no
experience required. Random sampling statistics relate to the quality of
dimensions are used in selecting among dimension candidates. A novel
image-based classifier using a convolutional neural network (CNN) is introduced
to further determine the positions of these fixed-dimension components. Two
examples from wearable products have been chosen to examine the entire
workflow. Their final designs are realistic and their performance metrics are
not inferior to the ones designed by experienced engineers.
( 2
min )
We introduce a probabilistic technique for full-waveform inversion, employing
variational inference and conditional normalizing flows to quantify uncertainty
in migration-velocity models and its impact on imaging. Our approach integrates
generative artificial intelligence with physics-informed common-image gathers,
reducing reliance on accurate initial velocity models. Considered case studies
demonstrate its efficacy producing realizations of migration-velocity models
conditioned by the data. These models are used to quantify amplitude and
positioning effects during subsequent imaging.
( 2
min )
Generative models of macromolecules carry abundant and impactful implications
for industrial and biomedical efforts in protein engineering. However, existing
methods are currently limited to modeling protein structures or sequences,
independently or jointly, without regard to the interactions that commonly
occur between proteins and other macromolecules. In this work, we introduce
MMDiff, a generative model that jointly designs sequences and structures of
nucleic acid and protein complexes, independently or in complex, using joint
SE(3)-discrete diffusion noise. Such a model has important implications for
emerging areas of macromolecular design including structure-based transcription
factor design and design of noncoding RNA sequences. We demonstrate the utility
of MMDiff through a rigorous new design benchmark for macromolecular complex
generation that we introduce in this work. Our results demonstrate that MMDiff
is able to successfully generate micro-RNA and single-stranded DNA molecules
while being modestly capable of joint modeling DNA and RNA molecules in
interaction with multi-chain protein complexes. Source code:
https://github.com/Profluent-Internships/MMDiff.
( 2
min )
De novo drug design is a pivotal issue in pharmacology and a new area of
focus in AI for science research. A central challenge in this field is to
generate molecules with specific properties while also producing a wide range
of diverse candidates. Although advanced technologies such as transformer
models and reinforcement learning have been applied in drug design, their
potential has not been fully realized. Therefore, we propose MolRL-MGPT, a
reinforcement learning algorithm with multiple GPT agents for drug molecular
generation. To promote molecular diversity, we encourage the agents to
collaborate in searching for desirable molecules in diverse directions. Our
algorithm has shown promising results on the GuacaMol benchmark and exhibits
efficacy in designing inhibitors against SARS-CoV-2 protein targets. The codes
are available at: https://github.com/HXYfighter/MolRL-MGPT.
( 2
min )
This paper introduces our system submission for the Cadenza ICASSP 2024 Grand
Challenge, which presents the problem of remixing and enhancing music for
hearing aid users. Our system placed first in the challenge, achieving the best
average Hearing-Aid Audio Quality Index (HAAQI) score on the evaluation data
set. We describe the system, which uses an ensemble of deep learning music
source separators that are fine tuned on the challenge data. We demonstrate the
effectiveness of our system through the challenge results and analyze the
importance of different system aspects through ablation studies.
( 2
min )
Document representation is the core of many NLP tasks on machine
understanding. A general representation learned in an unsupervised manner
reserves generality and can be used for various applications. In practice,
sentiment analysis (SA) has been a challenging task that is regarded to be
deeply semantic-related and is often used to assess general representations.
Existing methods on unsupervised document representation learning can be
separated into two families: sequential ones, which explicitly take the
ordering of words into consideration, and non-sequential ones, which do not
explicitly do so. However, both of them suffer from their own weaknesses. In
this paper, we propose a model that overcomes difficulties encountered by both
families of methods. Experiments show that our model outperforms
state-of-the-art methods on popular SA datasets and a fine-grained aspect-based
SA by a large margin.
( 2
min )
While significant advancements have been made in the field of fair machine
learning, the majority of studies focus on scenarios where the decision model
operates on a static population. In this paper, we study fairness in dynamic
systems where sequential decisions are made. Each decision may shift the
underlying distribution of features or user behavior. We model the dynamic
system through a Markov Decision Process (MDP). By acknowledging that
traditional fairness notions and long-term fairness are distinct requirements
that may not necessarily align with one another, we propose an algorithmic
framework to integrate various fairness considerations with reinforcement
learning using both pre-processing and in-processing approaches. Three case
studies show that our method can strike a balance between traditional fairness
notions, long-term fairness, and utility.
( 2
min )
Realistic synthetic tabular data generation encounters significant challenges
in preserving privacy, especially when dealing with sensitive information in
domains like finance and healthcare. In this paper, we introduce
\textit{Federated Tabular Diffusion} (FedTabDiff) for generating high-fidelity
mixed-type tabular data without centralized access to the original tabular
datasets. Leveraging the strengths of \textit{Denoising Diffusion Probabilistic
Models} (DDPMs), our approach addresses the inherent complexities in tabular
data, such as mixed attribute types and implicit relationships. More
critically, FedTabDiff realizes a decentralized learning scheme that permits
multiple entities to collaboratively train a generative model while respecting
data privacy and locality. We extend DDPMs into the federated setting for
tabular data generation, which includes a synchronous update scheme and
weighted averaging for effective model aggregation. Experimental evaluations on
real-world financial and medical datasets attest to the framework's capability
to produce synthetic data that maintains high fidelity, utility, privacy, and
coverage.
( 2
min )
This paper tackles the challenge of automatically assessing physical
rehabilitation exercises for patients who perform the exercises without
clinician supervision. The objective is to provide a quality score to ensure
correct performance and achieve desired results. To achieve this goal, a new
graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with
Transformer, is introduced. This model combines a modified version of STGCN and
transformer architectures for efficient handling of spatio-temporal data. The
key idea is to consider skeleton data respecting its non-linear structure as a
graph and detecting joints playing the main role in each rehabilitation
exercise. Dense connections and GRU mechanisms are used to rapidly process
large 3D skeleton inputs and effectively model temporal dynamics. The
transformer encoder's attention mechanism focuses on relevant parts of the
input sequence, making it useful for evaluating rehabilitation exercises. The
evaluation of our proposed approach on the KIMORE and UI-PRMD datasets
highlighted its potential, surpassing state-of-the-art methods in terms of
accuracy and computational time. This resulted in faster and more accurate
learning and assessment of rehabilitation exercises. Additionally, our model
provides valuable feedback through qualitative illustrations, effectively
highlighting the significance of joints in specific exercises.
( 3
min )
Large Language Models (LLMs) hold transformative potential in aviation,
particularly in reconstructing flight trajectories. This paper investigates
this potential, grounded in the notion that LLMs excel at processing sequential
data and deciphering complex data structures. Utilizing the LLaMA 2 model, a
pre-trained open-source LLM, the study focuses on reconstructing flight
trajectories using Automatic Dependent Surveillance-Broadcast (ADS-B) data with
irregularities inherent in real-world scenarios. The findings demonstrate the
model's proficiency in filtering noise and estimating both linear and curved
flight trajectories. However, the analysis also reveals challenges in managing
longer data sequences, which may be attributed to the token length limitations
of LLM models. The study's insights underscore the promise of LLMs in flight
trajectory reconstruction and open new avenues for their broader application
across the aviation and transportation sectors.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )